Introduction
Bits
An established atomic unit for conveying and representing information is a binary digit or, briefly, a bit.
A bit can assume two possible states, namely either 0 or 1.
Of course we can have alternative meanings associated to state 0 and
state 1. For example, 0 may mean false
and 1 may mean true
,
or 0 may mean white and 1 may mean black:
Historical remark. Claude E. Shannon [A mathematical theory of communication. Bell System Tech. J. 27 (1948) 379–423, 623–656] attributes the suggestion of using the word “bit” in this context to J. W. Tukey.
Data
In computing, data consists of bits.
In fact, it makes sense to define the word “data” in a computing context to mean “a sequence of bits”. This is the definition that we adopt.
This definition is convenient because it allows us to easily and unequivocally measure the amount of data that we have, that is, the number of bits the sequence. This also makes it easy to determine whether we can store the data in a storage device with a known storage capacity in bits.
Another reason why we want to define “data” to mean “a sequence of bits” is that it forces us to adopt an appropriately low-level perspective to computing, given our goal to understand the computer as a machine.
A sequence of bits is something that is simple and tangible enough for a machine to operate on.
Data and its format
By our definition, data is nothing but a mundane sequence of bits.
What makes data interesting is that we can agree on conventions and standards on what the data means or represents. Such agreed conventions and standards that attach a meaning to the data are called the format of the data.
Example 1. Here is a particular sequence of 27550 bits:
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111100000000110000001110000110000000000011000000000000111110000111110011111110011111100110000011011111111110111111100111110001111111111000000000000000000000000000000000000000000000000000001110000000000011100000111000111000000000001110000000000110011000111101100011100001110000011100001100111110001001110000111001000011111001000000000000000000000000000000000000000000000000000000011000000000001111000011111011110000000000111000000000010000000011000000000110000100000001111000110000011000000001000011000000000001100000000000000000000000000000000000000000000000000000000000110000000001101100001111111111000000000110110000000001000100011000000000011000011000000111110011000001100000001110001100100000000110000000000000000000000000000000000000000000000000000000000011000000000110011000110011101100000000011001100000000111111101100000000001100001111100011011001100000110000000011000111111100000011000000000000000000000000000000000000000000000000000000000001100000000011001100011000100111000000001101110000000001110111110000000000110000111110001100110110000011000000001100000110111000001100000000000000000000000000000000000000000000000000000000000110000000001111111001100000001100000001111111100000000000001101000000000011000011000000110011111000001100000000110000000001100000110000000000000000000000000000000000000000000000000000000000011000000000100001100110000000110000000110000110000000110000110110000000001100001100000011000111100000110000000011000010000110000011000000000000000000000000000000000000000000000000000000000011110000000110000011011000000011100000011000001100000011111111001111111001111100110000001100001110000011000000011110001111111000001100000000000000000000000000000000000000000000000000000000011111100000011000001101100000000110000001100000110000000111111000011111001111110011111110110000011000001100000011111100011111000000110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111011111100000011111000110000011000000000111111111001000001000111111000000011111001100000000110000001100000110011111100111111111100000000000000000000000000000000000000000000000000000000111111101111110000001111110011100001100000000011111111101100000100111111000000001111110110000000011100000111000011011111110011111111100000000000000000000000000000000000000000000000000000000001000000110001100001100011001111001110000000000000110000110000010011000000000000100011011000000001110000011110001101100000000000110000000000000000000000000000000000000000000000000000000000000110000011000110000100000110111111111100000000000011000011000001101100000000000010001101100000001101100001111000110010000000000011000000000000000000000000000000000000000000000000000000000000011000000100011000110000011011011110110000000000001100000110001110011010000000001101110110000000110011000111110011001100000000001100000000000000000000000000000000000000000000000000000000000001111110011111000011000001101100110011100000000000110000011111111001111100000000111110011000000011001100011001101100111110000000110000000000000000000000000000000000000000000000000000000000000110000001111110001100000110110000000110000000000011000001100001100110000000000011000001100000011111111001100111110011000000000011000000000000000000000000000000000000000000000000000000000000011000000110011100110000011011000000011000000000001100000110000110011000000000001100000110000001111111100110001111001100000000001100000000000000000000000000000000000000000000000000000000000001100000011000111001100011001100000001110000000000110000011000011001100000000000110000011100000110000011011000011100110000000000110000000000000000000000000000000000000000000000000000000000000110000001100001100011111100110000000011000000000011000001100001100111111000000011000001111111011000001101100000110011111100000011000000000000000000000000000000000000000000000000000000000000000000000100000000000111000000000000001000000000001000000100000010001111100000000100000011111100000000000000000001000111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011000000110000011110000001111000000000000111110000000011110000001110000000111100010000000000001110000000000000000010000110000010000001000000000000000000000000000000000000000000000000000011111111001111110011111100001111110000000000111111000000011111110001111100000111111001100000110001111100000000110000011001111110001000001100000000000000000000000000000000000000000000000000001000011000110011100110011000110001100000000000110000000001100011000110011000000110000111000011001100011000000011100011000110011100100000110000000000000000000000000000000000000000000000000000000011100110000110010001100001000110000000000011000000000110001100010001100000011000011110001100100000000000000111001100110000110010000011000000000000000000000000000000000000000000000000000000011100011000001001000110000110011000000000001100000000011011110001000110000001100001111100110110000000000000001111100011000001001000001100000000000000000000000000000000000000000000000000000011100001100000110111111000011111000000000000110000000001101111100111111000000110000110111011011000111000000000011110001100000110100000110000000000000000000000000000000000000000000000000000011100000100000011011111100001111000000000000011000000000110000011011111100000011000011001101101100111110000000000110000100000011010000011000000000000000000000000000000000000000000000000000011100000011000001001100111000110000000000000000110000000011000001101100111000001100001100011110110000011000000000011000011000001101100001100000000000000000000000000000000000000000000000000011100000001100001100110001110011000000000000000011000000001100000110110001110000110000110000111001100011000000000000100001100001100110001110000000000000000000000000000000000000000000000000001111111100011111100011000011001100000110000000111110000000111111110011000011100111110011000011100111111100000000000010000011111100001111110000000000000000000000000000000000000000000000000000111111110000111100001100000100010000011000000011111000000011111110001100000100111111001000000110001111100000000000001000000111100000011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111110111111000011111001000001100110000011000111110001100000000111110000011111000111111001111110001111100000000111111000011111100110000011000111110001100000110011111100000000000000000000101111010111010000011100110110000110011100001100111011100110000000111011100011100100001110000110000001100000000000011000110001100000011100001100111011100111000011011100111000000000000000000000011000001000000011100000011000011001111000110011000010011000000011000110001000000000011000011000000100000000000001100011000100000000111001100011000110011110001101100001110000000000000000000001100000110000001100000001100001100111100011011000001101100000011000001101100000000001100001100000011001000000000110011100011000000001110110011000001101111100110010000011000000000000000000000110000011111000110000000111111110011011001101100000110110000001100000110110000100000110000111110001111111000000011011111001111100000011110001100000110110111011011100001100000000000000000000011000001111100011000000011111111001100110110110000011011000000110000011011001111000011000011111000011111110000001101111110111110000000110000110000011011001101100110000010000000000000000000001100000110000001100000001100001100110011111011000001101100000011000001101100000110001100001100000000000011000000110000011011000000000001000011000001101100011110011000011000000000000000000000110000011000000011000000110000010011000111101100001110110000001100001110011000110000110000110000001100001100000011000001101100000000000100000100001110110000111001100001100000000000000000000011000001110000001111111111000001101100001110011111110011111100011111110001110111000111110011100000011111110000001110011100111000000000011000011111110011000001100110111110000000000000000000001100000111111100011111101100000110110000011000111110001111111000111110000011111000111111000111111001111110000000111111100001111110000001100000111110000100000010011111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000110001111000010000001011111100000000011111110011000011000000010000000011111000111111011000000100001100000011111111110111111000011110000100000010000000000000000000000000000000000000000110000011001111110001000000101111110000000000111110011110001100000011100000011111100011111101110000010000111000001111111111011111100011111100111000001100000000000000000000000000000000000000001100011000100001100100000010110001100000000001110001111101111000001111000001100010000011000111100001100011100000000011000000011000011100011011110000110000000000000000000000000000000000000000011011000110000010010000001011000110000000000011000111111111100001101100001100000000001100011111000110011011000000001100000001100001100000101111100011000000000000000000000000000000000000000000111100011000001101000000100100011000000000001100011001110111000110011000110000000000110000111110011001100110000000110000000110000110000010011111001100000000000000000000000000000000000000000001100001100000110100000010011111000000000000110001100011001100011001100011001111000011000011011101100110011000000011000000011000010000001101101110110000000000000000000000000000000000000000000110000110000011011000011001111110000000000011000110000000110001111111001100000110001100001100111110011111110000001100000001100001000000110110011111000000000000000000000000000000000000000000011000011000011001100001100110011100000000001100011000000011101110001100110000110000110000110001111011100011000000110000000110000110000110011000111100000000000000000000000000000000000000000001100001110011100011011110011000111000000000110001100000000110110000011001100011000011100011000011101100000110000011000000011100011100111001100001110000000000000000000000000000000000000000000110000011111100001111110001100001100000001111110110000000011011000001100111111000111111101100000110110000011000001100000011111100111111000110000011011000000000000000000000000000000000000000001000000011100000001100000100000000000000110000000000000001101000000010000111000001100000000000001000000000100000010000001100000000110000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000110110100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000011011010111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000001101101010111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000010111101011010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000111100101101000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000011100011100100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000001110000111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000001111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000011111100110111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111110000000000000000000000000000000000000001000000000000000000000011001110011110000011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111000001110000000000000000000000000000000000000100000000000000000000111001111111110000000001110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011110000000000001100000000000000000000000000000000000011000000000000000000010111100001110000000000001100000000000000000000000000000000000000000000000000000000000000000000000000000000000001111000000000000000010000000000000000000000000000000000000100000000000000000011001100000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000011100000000000000000001100000000000000000000000000000000000011000000000111000001111110010000100110000000001100000000000000000000000000000000000000000000000000000000000000000000000000000000111000000000000000000000010000000000000000000000000000000000000100000000111111111111000000000011111110000000010000000000000000000000000000000000000000000000000000000000000000000000000000000110000000000000000000000001100000000000000000000000000000000000001000000110001100000000000000000111111100000001000000000000000000000000000000000000000000000000000000000000000000000000000000110000000000000000000000000010000000000000000000000000000000000000010000010110110000000000000000000111110000000110000000000000000000000000000000000000000000000000000000000000000000000000000110000000000000000000000000000100000000000000000000000000000000000000000001001101000000000000000111111111000000001000000000000000000000000000000000000000000000000000000000000000000000000000110000000000000000000000000000010000000000000000000000000000000000000000000100110100000111111000000111111100000000100000000000000000000000000000000000000000000000000000000000000000000000000011000000000000000000011110000001100000000000000000000000000000000000000000010011011111111111110000001001100000000010000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000001111100000110000000000000000000000000000000000000000001100001111110111111000000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000100000000000010111000111110000000100000000000000000000000000000000000000000010001100110011011000000010000000000000100000000000000000000000000000000000000000000000000000000000000000000000000010000000000001111100001111000000011000000000000000000000000000000000000000000111100010001000000000001100000000000010000000000000000000000000000000000000000000000000000000000000000000000000001000000000000111111101111110000001100000000000000000000000000000000000000000000000001000100000000000110000000000001000000000000000000000000000000000000000000000000000000000000000000000000000100000000000011111111111111110000010000000000000000001111000000000000000000000000000100110000000000011000011000000100000000000000000000000000000000000000000000000000000000000000000000000000011000000000000011111111111111100000100000000000000001111111110000000000000000000000010011000000000001101001111100010000000000000000000000000000000000000000000000000000000000000000000000000001100000000000001111111111110011000010000000000000011100000011111110000000000000000001011000000000000111110111110001000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000111111111011000100101100000000000111000000000000011111110000000000000111100000001100011011111110000100000000000000000000000000000000000000000000000000000000000000000000000000001000000000000011111111100110111110110000000001110000000000000000000111111110000000111110000001111001111110011000010000000000000000000000000000000000000000000000000000000000000000000000000000100000000000001111111110011111100011100000011100000000000000000000000000111111100110111000000101100111100011000011000000000000000000000000000000000000000000000000000000000000000000000000001111111100000000111111111111111111111110000111000000000000000000000000000000000111110111100000111100011000011000001100000000000000000000000000000000000000000000000000000000000000000000000011111111111100001011111111111111111110001001111000000000000000000000000000000000001110011111111110110001100011000000110000000000000000000000000000000000000000000000000000000000000000000000111111111111111010001111111111111111111010111110000000000000000000000000000000000000011111100111110110000110011100000010000000000000000000000000000000000000000000000000000000000000000000000011111111111111100010111111111111111111111111100000000000000000000000000000000000000000111110001111111111111111000000011000000000000000000000000000000000000000000000000000000000000000000000001111111111111111100011111111111111111111111100000000000000000000000000000000000000000111000000111011000111111111000000100000000000000000000000000000000000000000000000000000000000000000000000111111111111111100011111111111111111111111110000000000000000000000000000000000000000000000000001111100000000111111110110000000000000000000000000000000000000000000000000000000000000000000000011111111111111111001111111111111111111111110000000000000000000000000000000000000000000000000001111110000000000000111111100000000000000000000000000000000000000000000000000000000000000000000000000000010111111000011111111111111111000010000000000000000000000000000000000000000000000000000010000000000000000000001111111110000000000000000000000000000000000000000000000000000000000000000000000001001111110001111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111110000000000000000000000000000000000000000000000000000000000000000000100111110000111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111110000000000000000000000000000000000000000000000000000000000000011011111100011111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111100000000000000000000000000000000000000000000000000000000000101111100001111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111100000000000000000000000000000000000000000000000000000010111111000111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111100000000000000000000000000000000000000000000000001111111000011111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111000000000000000000000000000000000000000000000111111110001111000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111100000000000000000000000000000000000000001111111011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111100000000000000000000000000000000000111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111000000000000000000000000000000011111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111100000000000000000000000000111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111000000000000000011100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111000000000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001100000001110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111000011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111000001110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111000001111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111100001111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111100001111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111100000111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111100000111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111110000011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100010001110000011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Suppose now 0 means white and 1 means black. Suppose furthermore that we split the data into 145 segments of 190 consecutive bits (\(145 \times 190 = 27550\)), and stack the segments on top of each other to form an array with 145 rows and 190 columns. What we obtain is the following black-and-white rendering of a Dilbert strip:
Observe in particular that to reconstruct the visual rendering as a \(145\times 190\) array, we needed to agree on a format for the data. That is, that the 27550 bits represent a black-and-white image of dimensions \(145 \times 190\), with a 0 indicating white and a 1 indicating black.
Example 2. Here is a particular sequence of 64 bits:
0100000000001001001000011111101101010100010001000010110100011000
In this case we are in fact looking at a sequence that, if interpreted
as a double-precision floating-point number conforming to
the IEEE 754
standard (in more precise terms, the format binary64
in
IEEE Std 754-2008 IEEE Standard for Floating-Point Arithmetic),
is an approximation of \(\pi\):
Incidentally, this sequence of 64 bits is exactly the internal representation
of scala.math.Pi
. Observe again that the data in itself is just
a sequence of 64 bits unless we agree on a format for the data, that is,
that the data has some agreed meaning beyond its bits.
Example 3. Here is a particular sequence of 256 bits:
0010000101000111011100100110010101100101011101000110100101101110011001110111001100100000011001100111001001101111011011010010000001110100011010000110010100100000011100000110110001100001011011100110010101110100001000000101101001101111011100100111000000100001
Suppose we receive this sequence from outer space. In such a situation, one possible assumption is that this sequence is intended to initiate a dialogue and is created by someone or something who is capable of conversing using human conventions and terms.
So why not assume that the sequence encodes a string of Unicode characters using the UTF-8 character encoding. (Currently, UTF-8 is the dominant byte encoding for HTML pages on the World-Wide Web.) If we follow this intuition, we discover that the 256 bits encode the following string of 32 characters:
!Greetings from the planet Zorp!
It appears that the Zorpians are in a conversational mood, and have no problem conversing using Earth standards. Of course, again the data is just a sequence of bits, so in principle we could just as well be looking at a black-and-white image, or, say, a sequence of four double-precision floating-point numbers conforming to IEEE 754-2008.
Types of data
The examples above exposed us to two basic types of data:
Numerical data: data that encodes numbers.
Textual data: data that encodes a sequence of characters over a fixed set of characters.
Data may of course represent many other types of information. Data may be sound, video, executable binaries, printing instructions to a 3D printer, or essentially whatever we like, as long as we can agree on a format.
The two basic types however occur frequently enough that they should be known to all serious programmers. And in fact we already have had exposure to such data in programming terms, even if we may have not paid attention to the fact that we are in fact working with sequences of bits.
Let us now recall a bit of Scala. In particular, we observe that the division into two basic types is rather prominent in the basic types used in Scala. Indeed, most Scala programs involve:
Numerical data: integers (
Byte
,Short
,Int
, andLong
) or floating point numbers (Float
andDouble
), orTextual data: individual characters (
Char
) or strings (String
).
Fixed-length data, words, word length
At hardware level, computers process data in units that consist of a fixed number bits. Such units are called words.
The length of a word is the number of bits in it. The maximum length of a word that is supported by the processor for general-purpose computations is called the word length of the architecture.
For example, you may have heard that your computer has, say, a 64-bit central processing unit (CPU). In practice this means that the processor is physically built to compute with 64-bit words of data when it executes one operation. In most cases the hardware also offers direct support to compute with smaller words than the full word length. In particular, a 64-bit architecture typically supports operations on 32-bit, 16-bit, and 8-bit words.
Although Scala is a hardware-independent language, Scala has been designed to take advantage of the available hardware features as they are available. Accordingly, the basic value types in Scala have fixed lengths that parallel the word operations typically supported in hardware.
More precisely, the integer value types in Scala have the following fixed lengths in bits:
Byte
(8 bits),
Short
(16 bits),
Int
(32 bits), and
Long
(64 bits).
(Some architectures may not support 64-bit operations in hardware, in which case such operations will be simulated in software using the available hardware. For example, a 32-bit processor may simulate a 64-bit operation using several 32-bit operations.)
The number of states for a sequence of b bits. A fundamentally important fact to understand about a \(b\)-bit word is that each of its \(b\) bits can independently be in exactly one of two possible distinct states, either 0 or 1. Thus, the word can be in exactly one of \(2^b = 2\cdot 2\cdot\ldots\cdot 2\) distinct states.
In particular this implies the following number of distinct states for the typical word lengths:
Variable-length data
Data may also have a variable length. In this case the data is processed one word at a time at hardware level.
For example, think about a string such as
"!Greetings from the planet Zorp!"
.
A string is a sequence of characters whose length may be arbitrary, that is, the length is not restricted by the format of the data. (We will review common string formats in more detail below.)
Variable-length data is in practice restricted only by our ability to store the data, which depends on the available storage capacity, ranging from cache memory inside the processor, to main memory, to secondary storage devices (such as an individual hard drive), to networked storage solutions for massive data. Of course, not all data needs to be stored. That is, data may be processed without storing it.
Indeed, in great many applications the data has an unknown length. Think about the sequence of key presses that a user effects at a keyboard. A key is pressed, then released, then the next key is pressed, and so forth. We simply do not know how many key presses and releases will occur, and in what order. Similarly so if we are running a networked device that accepts incoming data from the network that must be processed somehow.
In such settings one rather focuses on the rate at which the data arrives, and the latency at which the program (or device) must process the data.
For example, a program must react to user key presses rather quickly (with low latency), or the user experience deteriorates. Similarly, when routing network traffic, the traffic must be forwarded to its intended destinations at the same rate as it arrives, or the traffic will in general overwhelm the device.
Units of measure for data
One bit (1 bit, or 1 b) is the standard unit of measure for data and information. A common derived unit is one byte (1 byte, or 1 B) which is in practice understood to mean exactly 8 bits. A measurement in bits or bytes is often combined with an appropriate prefix indicating the order of magnitude. Here it is customary to either use the decimal base (base 10) or the binary base (base 2) to indicate magnitudes. If base 10 is used, the prefixes for order of magnitude follow the familiar SI definitions:
Prefix
Short
Order
kilo
k
\(10^{3}=1000\)
mega
M
\(10^{6}=1000000\)
giga
G
\(10^{9}=1000000000\)
tera
T
\(10^{12}=1000000000000\)
peta
P
\(10^{15}=1000000000000000\)
exa
E
\(10^{18}=1000000000000000000\)
zetta
Z
\(10^{21}=1000000000000000000000\)
yotta
Y
\(10^{24}=1000000000000000000000000\)
Example 1. One gigabit means one billion bits.
Example 2. If we use one square millimeter to store one bit of data, one megabit uses up one square meter, one terabit uses up one square kilometer, and one zettabit uses up more than the entire surface area of planet Earth. (Fortunately, in practice one bit can be stored in a considerably smaller area than one square millimeter!)
Example 3. It takes 8000 seconds (a little over 2 hours) to transfer one terabyte of data over a one-gigabit-per-second network link. Transfering one exabyte over the same link takes about 250 years.
Alternatively, orders of magnitude are measured in the binary base 2 using the IEC 80000-13 definitions:
Prefix
Short
Order
kibi
Ki
\(2^{10}=1024\)
mebi
Mi
\(2^{20}=1048576\)
gibi
Gi
\(2^{30}=1073741824\)
tebi
Ti
\(2^{40}=1099511627776\)
pebi
Pi
\(2^{50}=1125899906842624\)
exbi
Ei
\(2^{60}=1152921504606846976\)
zebi
Zi
\(2^{70}=1180591620717411303424\)
yobi
Yi
\(2^{80}=1208925819614629174706176\)
Here the prefix “kibi” stands for “kilobinary”, “mebi” stands for “megabinary” and so forth.
Warning
It is an unfortunate and common misuse to report the amount of data using the SI prefixes (that is, what appears to be in base 10) even if base 2 is actually intended. In particular, never assume that something that is reported using SI prefixes actually uses base 10 if this would lead to system failure when base 2 was actually intended. For example, you must not assume that \(1\) GB means exactly \(10^9\) bytes (the SI definition of \(1\) GB) if loss of data (or insufficient data transfer rate) will occur if you reserve storage space (or transfer bandwidth) only for \(10^9\) bytes (per second) when actually \(2^{30}\) bytes (per second) was intended.
Note
Observe that \(2^{10m}\) and \(10^{3m}\) diverge from each other at an exponential rate as \(m\) increases. For \(m=1\) the divergence is over \(2\%\) whereas for \(m=8\) it is already over \(20\%\)!
Witnessing the bits, in Scala
Now let us get in contact with the bits. This will in particular serve as motivation to study the bit-level formats for data.
To kick off our study of bits and data, we want to see the bits, at the console.
That is, we challenge you to take any value in any of the following types:
Byte
Short
Int
Long
Char
String
Float
Double
Then, use the functions tellAbout* defined below to tell about the value. (Let us not pay attention yet to how the functions actually work. For the time being we are happy just to see the bits.)
Each function
prints the sequence of bits that constitute the value, and
tells what the bits mean, in the meaning (format) that Scala attaches to the type.
Example 1. Assuming we have copy-pasted the tellAbout*
functions into the console, let us take our favorite Int
, say, let
us take 123, and see what we get when we ask the function tellAboutInt
to tell us about the value:
scala> val q = 123
q: Int = 123
scala> tellAboutInt(q)
I am a 32-bit word that Scala views as having format 'Int' ...
... my bits are, in binary, (00000000000000000000000001111011)_2
... or equivalently, in hexadecimal, 0x0000007B
... Scala prints me out as 123
... I represent the signed decimal integer 123
... I represent the unsigned decimal integer 123
All right. So an Int
is in fact just a 32-bit word of data.
Example 2. Let us try the same with a negative integer. Here we see, among other things, how negative numbers are represented in binary using something called two’s complement representation. (The details of this representation need not concern us at this point.)
scala> val qq = -123
qq: Int = -123
scala> tellAboutInt(qq)
I am a 32-bit word that Scala views as having format 'Int' ...
... my bits are, in binary, (11111111111111111111111110000101)_2
... or equivalently, in hexadecimal, 0xFFFFFF85
... Scala prints me out as -123
... I represent the signed decimal integer -123
... I represent the unsigned decimal integer 4294967173
Example 3. A character is nothing but a sequence of bits that has been agreed to encode or represent that character.
scala> val c = 'n'
c: Char = n
scala> tellAboutChar(c)
I am a 16-bit word that Scala views as having format 'Char' ...
... my bits are, in binary, (0000000001101110)_2
... or equivalently, in hexadecimal, 0x006E
... Scala prints me out as n
... indeed, my bits acquire meaning via the Unicode standard
https://www.unicode.org/
... I represent a single 16-bit Unicode character '\u006E'
... as a signed decimal integer, I am 110
... as an unsigned decimal integer, I am 110
Example 4. A string is nothing but a sequence of characters. That is, a sequence of bits that splits into encodings of characters, each of which is a sequence of bits.
scala> val s = "nakkisoossi"
s: String = nakkisoossi
scala> tellAboutString(s)
I am compound data that Scala views as having format 'String' ...
... in essence, I am a sequence of 11 consecutive 16-bit words,
each of which Scala views as having format 'Char'
... the bits of this sequence are, in binary, (00000000011011100000000001100001000000000110101100000000011010110000000001101001000000000111001100000000011011110000000001101111000000000111001100000000011100110000000001101001)_2
... or equivalently, in hexadecimal, 0x006E0061006B006B00690073006F006F007300730069
... Scala prints me out as nakkisoossi
Example 5. Scientific and engineering computations require us to work with numerical data in floating point representation. This data has a more intricate format, but at heart every such number is just a sequence of bits.
scala> val d = scala.math.Pi
d: Double = 3.141592653589793
scala> tellAboutDouble(d)
I am a 64-bit word that Scala views as having format 'Double' ...
... my bits are, in binary, (0100000000001001001000011111101101010100010001000010110100011000)_2
... or equivalently, in hexadecimal, 0x400921FB54442D18
... Scala prints me out as 3.141592653589793
... indeed, my bits acquire meaning as specified
in the binary interchange format 'binary64' in the
IEEE Std 754-2008 IEEE Standard for Floating-Point Arithmetic
https://dx.doi.org/10.1109%2FIEEESTD.2008.4610935
... this format is a bit-packed format with three components:
(0100000000001001001000011111101101010100010001000010110100011000)_2
a) the sign
(bit 63): 0
b) the biased exponent
(bits 62 to 52): 10000000000
c) the trailing significand
(bits 51 to 0): 1001001000011111101101010100010001000010110100011000
... my biased exponent 1024 indicates that I am a _normalized_ number
with a leading 1-digit in my significand and
an unbiased exponent 1 = 1024 - 1023
... that is, in _binary radix point notation_, I am exactly
(1.1001001000011111101101010100010001000010110100011000)_2 * 2^{1}
... or what is the same in _decimal radix point notation_, I am exactly
3.141592653589793115997963468544185161590576171875
After these examples, you may want to try it out yourself or
continue reading further. You can find the source code for the tellAbout
-functions in the next Section.
Take any value in any of the aforementioned types.
Ask the console to tell about the value.
What you should observe is that, at heart, and as far as computing is concerned, everything is a sequence of bits.
Data and its format – a roadmap for this round
Our intent in this round is to make a guided tour of the most basic bit-level formats that every serious programmer should know. Not necessarily know by heart, of course, but the serious programmer should understand the implications that the bit-level and hardware-level representations have, to all programming.
Our tour will feature the two main types of data:
Numerical data — integers and floating-point numbers, and
Textual data — characters and strings.
These formats are common to all of digital computing and all programming languages, not just Scala.