The Art of Programming

By Frans Faase

The larger part of my life, I have been writing programs. (A short account of this is given on my page My life as a hacker.) Through the years programming languages and compilers have been my area of interest. The last years, I have thought a lot about what programming really is, zooming in on the concept of implementation. I am getting the strange feeling that while writing programs, I am basically doing the same kind of things over and over again, although through the years the languages and tools have changed. Many repetitious tasks are now preformed with programs in a routine manner without human intervention. But writing programs has remained, for the bigger part, a repetitious task itself. And although we use the term "Software Engineering", writing programs is still more a craftmanship then an engineering task. There is a host of imperative programming languages, e.g., languages in which imperative programs are written, but a lack of "implementation" languages, e.g., languages that in exact terms describe how certain functionality is realized by means of a set of imperative programs running on a certain computer configuration.

Below, I would like to expose some of my thoughts about the problem described above. First of all, I would like to say that these thoughts are not all original. Some are almost as old as I myself. See for example M. Douglas McIlroy's article on mass produced software components. Or read the pages on Generative and Component-Based Software Engineering and Algorithmics. There is also some similarity with the ideas underlying Model Driven Architecture as presented by OMG. (I do not like the idea that it is based on UML, because I get the feeling that UML is flawed.) The title of this page is inspired by a multi-volume standard work by D.E. Knuth.

Contents

  1. Why do we program?
    1.1    The memory piramid
    1.2    Linear accessed memory
    1.3    Distributed computing
    1.4    Understanding the issue
  2. Specifying versus implementing
  3. Objects versus values
  4. The nature of types
  5. A specification languages
  6. The nature of implementing

1   Why do we program?

One of the main reasons why so many people in the world spend their time writing programs, is because computers and networks are slow and have limited memory and bandwidth. This may sound strange, in a world where computers are getting faster and faster every month, and where memory is getting cheaper and cheaper. Who would have thought of PC having one gigabyte of main memory a few years ago? And doesn't the average PC at home with a good video card beat the average super computer of 10 years ago with ease? The reality is that the demand for computing power has gone up even faster.

So, I boldly state that we program because computers are too slow for our demands. Most serious programming, where speed is a requirement, is still done in the C programming language. Their is no denial that the C is so popular because it produces such fast executables.

1.1   The memory piramid

The average computer is based on the piramid memory model, where at the top you have a little memory which can be accessed very fast and at the bottom huge amounths of memory which is very slow. In a PC, the memory piramid typically consist of the following kinds of memory from the top to the bottom:

Kind of memory average size (bytes) average access time (s)
The registers in the CPU 103 10-9
The on-board primary cache 104 10-8.7
The secondary cache 105 10-8.5
The main memory 109 10-8
The hard disks 1010 10-5
The local network 1012 10-3
The Internet 1016 101

A lot of programming just consist of moving data between the various kinds of memory. At each level buffering and caching takes place of data originally stored at a lower level. (Note the invers relation between the average size in bytes and the average access time in seconds in the above table.)

1.2   Linear accessed memory

Another thing which makes programming difficult is the fact that memory can only be accessed as a linear array of bytes. A huge amounth of effort is put in organizing memory in more accessible way. In a sense, a file system consisting of files and directories, is nothing else than a mechanism to transform an array of about 1010 bytes in something more useful. The same holds for all database systems.

1.3   Distributed computing

Instead of one big computer this world is filled with more than 108 computers loosely connected by networks. The bottom line to this is that computers are so slow that we need so many of them. Communication between computers is another area that makes programming a difficult task.

1.4   Understanding the issue

We are so used to slow computers that most people have a hard time seeing the problem. Even famouse people have stated that we should first write the program and than optimize it. Yet, writing a progam means writing something that can be executed. That means you are already talking about implementing it, and thus about writing the program in such a way that it can be executed within a certain time limit.

2   Specifying versus implementing

Usually, when people talk about a program specification, they refer to a document describing the desired functionality of the program or system using plain English (or whatever language they speak). And implementing is seen as the process of writing the program, or building the system, which behaves according to the given specification. Often, making the implementation also means working out all the details that where not described in the specification document, and in the worse case reinterpreting the specification document to resolve all the errors that it contained.

In most cases the program specifications are not looked at any nor maintained, once the program is there. The program just serves as the specification. If after some time some additional functionality has to be implemented, then the programmers reverse engineer the code to determine the implemented specification. Some people even go as far as to state that the program is the specification. They simply state: "The code is everything there is". See for example eXtreme Programming.

In a sense they are right. The code is the ultimate reflection of both the specification (the what) and the implementation of the specification (the how). This reality is the cause of a lot of legacy code. I several times worked on projects where more than 90% of the code was more than 2 years old, and badly needed to be rewritten. But once a system works and has been tested, changing buggy code is not done, as the costs for introducing bugs is simply to high. The pain of legacy code is most felt when the system has to be reimplemented on a different platform. In some cases the best choice is just to forget about old code and rebuild the system from scratch.

These problems would not exist if we would have a means to just separate out the what (specification) and how (implementation) aspects of a program. Although several commonly used techniques such as Object-Orientation claim that they allow you to separate the specification from the implementation, they only do this to a very limited extend. At most they allow you to separate the interface and the implementation, but an interface of a class or component is not a specification. At most it is an indication of the primitives that are used to implement the specification.

The specification of a program or system is simply that what remains if you would abstract from all implementation details. Although this sounds like a clear cut between specification and implementation, the border line between the what and the how is not always that clear, because sometimes you can go on and on with finding more abstract representations of your system.

3   Objects versus values

Presently there are two kinds of data models. The one are the value based data models in which everything is considered to be a value and relationships between values. Relational databases are based on this concept. The other are the object oriented data models in which everything is considered as an object with properties and methods for changing the properties. In object oriented models, each object has a unique object identifier.

The idea to tie data and the operations on this data together is really a good idea. However, the basic operations on a data structure usually follow the nature of that data structure. (In many object-oriented programs, more than half of the methods are for manipulating and quering the data structures that the objects implement.)

The benefit of encapsulation is not relevant for specifications, because specifications try to abstract from implementation details, and the main purpose of encapsultation is to hide possible implementation details. The greatest danger of the object-oriented data model is the assumption that the best way to expres operations on collections of objects is by the methods of the objects. It is not for nothing that object query languages are being designed. However, attempts to define a functional object update language have failed, as the definition of the semantics of such a language are hard to define or produce very odd results. For me this has raised the question whether there can be any object-oriented specification language with well defined and intuitive semantics. The definition of functional update language for value based data models with simple and consistent semantics seems to be straight forward. Therefore, a better approach seems to be to start with a value based model and add a reference concept. This might results in a model which is even richer than the object-oriented model, but with simple and clear semantics.

For more about this also read Object-Oriented Considered Harmful.

4   The nature of types

The main purpose of having types in a language, is to say something about the correctness of expressions in that language. By attributing certain expressions with types, it is possible to reason about their correctness. Another use of it is to allow polymorphism, that it to have one name for a range of similar operations, which slightly differ based on the types of values that the operate on.

One way to define types is by giving the set of all distinct values that are valid for the type. However to arrive at a powerful typing system we have to allow infinite sets with possible infinite values. Although this seems to be a problem, we note that for checking whether a certain value belongs to a certain type it is not neccessary to first generate the whole set. Remember, we are in reasoning in the specification domain where it is not needed to have an finite realisation of types, as might be required in the implementation domain.

When types are represented as sets, the most natural definition of a subtype is to make it equivalent to a subset. A type A is thus a subtype of B, if and only if, the domain of A is a subset of B. Although, this may seem a very natural approach, it should be noted that in many object-oriented language, actually the reverse is true.

5   A specification language

The specification language presented here was designed with the idea to have a language that has rich set of concepts, yet does have a "simple" semantics. The specification language is described on three separate pages, each dealing with a separate aspect. These pages are:

6   The nature of implementing


My life as a hacker