The larger part of my life, I have been writing programs. (A short account of this is given on my page My life as a hacker.) Through the years programming languages and compilers have been my area of interest. The last years, I have thought a lot about what programming really is, zooming in on the concept of implementation. I am getting the strange feeling that while writing programs, I am basically doing the same kind of things over and over again, although through the years the languages and tools have changed. Many repetitious tasks are now preformed with programs in a routine manner without human intervention. But writing programs has remained, for the bigger part, a repetitious task itself. And although we use the term "Software Engineering", writing programs is still more a craftmanship then an engineering task. There is a host of imperative programming languages, e.g., languages in which imperative programs are written, but a lack of "implementation" languages, e.g., languages that in exact terms describe how certain functionality is realized by means of a set of imperative programs running on a certain computer configuration.
Below, I would like to expose some of my thoughts about the problem described above. First of all, I would like to say that these thoughts are not all original. Some are almost as old as I myself. See for example M. Douglas McIlroy's article on mass produced software components. Or read the pages on Generative and Component-Based Software Engineering and Algorithmics. There is also some similarity with the ideas underlying Model Driven Architecture as presented by OMG. (I do not like the idea that it is based on UML, because I get the feeling that UML is flawed.) The title of this page is inspired by a multi-volume standard work by D.E. Knuth.
Contents |
So, I boldly state that we program because computers are too slow for our demands. Most serious programming, where speed is a requirement, is still done in the C programming language. Their is no denial that the C is so popular because it produces such fast executables.
Kind of memory | average size (bytes) | average access time (s) |
The registers in the CPU | 103 | 10-9 |
The on-board primary cache | 104 | 10-8.7 |
The secondary cache | 105 | 10-8.5 |
The main memory | 109 | 10-8 |
The hard disks | 1010 | 10-5 |
The local network | 1012 | 10-3 |
The Internet | 1016 | 101 |
A lot of programming just consist of moving data between the various kinds of memory. At each level buffering and caching takes place of data originally stored at a lower level. (Note the invers relation between the average size in bytes and the average access time in seconds in the above table.)
In most cases the program specifications are not looked at any nor maintained, once the program is there. The program just serves as the specification. If after some time some additional functionality has to be implemented, then the programmers reverse engineer the code to determine the implemented specification. Some people even go as far as to state that the program is the specification. They simply state: "The code is everything there is". See for example eXtreme Programming.
In a sense they are right. The code is the ultimate reflection of both the specification (the what) and the implementation of the specification (the how). This reality is the cause of a lot of legacy code. I several times worked on projects where more than 90% of the code was more than 2 years old, and badly needed to be rewritten. But once a system works and has been tested, changing buggy code is not done, as the costs for introducing bugs is simply to high. The pain of legacy code is most felt when the system has to be reimplemented on a different platform. In some cases the best choice is just to forget about old code and rebuild the system from scratch.
These problems would not exist if we would have a means to just separate out the what (specification) and how (implementation) aspects of a program. Although several commonly used techniques such as Object-Orientation claim that they allow you to separate the specification from the implementation, they only do this to a very limited extend. At most they allow you to separate the interface and the implementation, but an interface of a class or component is not a specification. At most it is an indication of the primitives that are used to implement the specification.
The specification of a program or system is simply that what remains if you would abstract from all implementation details. Although this sounds like a clear cut between specification and implementation, the border line between the what and the how is not always that clear, because sometimes you can go on and on with finding more abstract representations of your system.
The idea to tie data and the operations on this data together is really a good idea. However, the basic operations on a data structure usually follow the nature of that data structure. (In many object-oriented programs, more than half of the methods are for manipulating and quering the data structures that the objects implement.)
The benefit of encapsulation is not relevant for specifications, because specifications try to abstract from implementation details, and the main purpose of encapsultation is to hide possible implementation details. The greatest danger of the object-oriented data model is the assumption that the best way to expres operations on collections of objects is by the methods of the objects. It is not for nothing that object query languages are being designed. However, attempts to define a functional object update language have failed, as the definition of the semantics of such a language are hard to define or produce very odd results. For me this has raised the question whether there can be any object-oriented specification language with well defined and intuitive semantics. The definition of functional update language for value based data models with simple and consistent semantics seems to be straight forward. Therefore, a better approach seems to be to start with a value based model and add a reference concept. This might results in a model which is even richer than the object-oriented model, but with simple and clear semantics.
For more about this also read Object-Oriented Considered Harmful.
One way to define types is by giving the set of all distinct values that are valid for the type. However to arrive at a powerful typing system we have to allow infinite sets with possible infinite values. Although this seems to be a problem, we note that for checking whether a certain value belongs to a certain type it is not neccessary to first generate the whole set. Remember, we are in reasoning in the specification domain where it is not needed to have an finite realisation of types, as might be required in the implementation domain.
When types are represented as sets, the most natural definition of a subtype is to make it equivalent to a subset. A type A is thus a subtype of B, if and only if, the domain of A is a subset of B. Although, this may seem a very natural approach, it should be noted that in many object-oriented language, actually the reverse is true.