Heroes of software engineering - Welsh and Quinn

CDC6600.jpgBy Ian Cottam, IT Services Research Lead, The University of Manchester.

This is the first in a series of blog posts on my heroes of software engineering. I hope you will find it (and subsequent ones) of interest.

# 1 – Welsh and Quinn

My first heroes are Jim Welsh and Colm Quinn. Why? Because their 1972 paper [1] was the first one that convinced me that there was a discipline worthy to be called software engineering and that what they did was a tour de force, leaving me open mouthed in admiration of the magic they had dared to conjure.

Welsh and Quinn did the work in 1971, their paper appeared in 1972, and I came across it around 1974, if memory serves. It was a time of mainly batch mode, room-sized mainframes. Networking was something a few futurists thought about, and our source and data files existed on punched cards or paper tapes, which were hand-carried to your computer centre or even sent through the post, with results coming back on large format line printer paper.

So, just what did Welsh and Quinn achieve that so impressed my 22 year old self? They brought Pascal to the British Isles from its origin in Switzerland. I say British Isles as they were based at the Queen’s University Belfast, Northern Ireland. From Northern Ireland their work quickly spread to Albion and beyond.

The Pascal programming language had an enormous impact in the 1970s and 80s. If you doubt this fact, you weren’t there. It doesn’t matter how many faults Pascal had (or has, as it is still used): it was such a massive improvement over what we had before (basically Algol 60, Fortran and Cobol). Its impact on both the worlds of higher education and industry is comparable with Java today.

When I say Welsh and Quinn brought Pascal to the British Isles, I mean they engineered a compiler for the language, and it was the manner of this engineering that caused Pascal’s designer (Niklaus Wirth) to later say:

“The project deserves special mention, because it should be considered as one of the earliest, genuinely successful ventures that were later to be called software engineering efforts.” [2]

Wirth had suggested the approach, but Welsh and Quinn had to make it work. Only a single compiler existed for Pascal; it was in Zurich, at Wirth’s home institution (ETH). It was itself written in Pascal, and in a single pass produced machine code directly for the CDC 6000 series. I can’t find any record of how big – in terms of statements or lines of code – the original compiler was. My guess is around 10,000 statements/lines. (If that seems smaller than you would guess, the reason may be that the trend for putting one statement on one line, and always nesting structure, had not yet caught on.)

The CDC 6000 was a big, expensive mainframe that few British universities had (if any). We all tended to have the British made ICL 1900 series, which had a wide range of size options that enabled both small and large institutions to purchase them with some commonality and portability of code. Queen’s University Belfast had a high-end ICL 1907 model.

So, let’s now outline the engineering plan (and pause to think if you would have taken this job on).

  1. The source of the ETH Pascal compiler was to be transported to Belfast.
  2. Welsh and Quinn would rip out the code generation sections and replace them with new sections that would generate ICL 1900 machine code.
  3. An ICL 1900 emulator would be written in Pascal.
  4. The new compiler and emulator would be hand carried (by Welsh and Quinn) to ETH Zurich to be compiled (bootstrapped in the vernacular) by the original Pascal compiler.
  5. The ICL 1900 emulator would be tested and debugged on the CDC 6000.
  6. The Pascal/1900 compiler would be tested and debugged on the CDC 6000 computer using the 1900 machine emulator.
  7. The Pascal/1900 compiler (binary and source) would be hand carried back to Belfast.

Obviously, any work carried out in Belfast was cheaper than work that needed to be done by the two men, away from home, in Zurich, and so the plan was refined to spend the minimum time there. (In fact they planned for a week in Zurich and they only slipped slightly, taking 8 days.) Let’s amplify the steps above somewhat.

1. The source of the ETH Pascal compiler was to be transported to Belfast

This isn’t mentioned in the paper [1], but my guess is that several boxes of punched cards were sent by post. Of course, in those days the two machines used very different character encoding sets. Again, I have to assume that something like the Unix tr (transliterate) command would have had to be written for Welsh and Quinn to work on the source. (Though, as the ICL 1907 computer was useless to them at this stage, it may be that they simply worked on paper, and produced a fresh card deck at the last minute to take to Zurich.)

2. Rip out the code generation sections and replace them with new sections that would generate ICL 1900 machine code

This is the meat of the project. There are no software tools to help you: it is completely a desk with paper and pencil exercise. Without going into all the details, Welsh and Quinn write that the two machine architectures were very different. This resulted in a lot of redesign work in two main areas: generating reasonably efficient code for arithmetic expressions, and setting up the static enclosing scopes before generating a call to a procedure. The remainder of the code generator was closer to a one-for-one replacement exercise. It is reported that these three areas of code generation took them a total of six person months.

Although the paper and pencil approach was forced on Welsh and Quinn, other engineers would later adopt it as a virtue: notably Harlan Mills at IBM where they named it the CleanRoom process of software development [3].

Vital to Welsh and Quinn’s engineering approach was something similar to what we might call today test-driven development. That is, for every construct that their new code generator could produce machine code for, they wrote one or more test programs and recorded the machine code text that they expected to be the results of compilation. Different from today is that they had no way to (unit) test these incrementally as they went along. For several weeks or months, they would work on paper, and then when they thought it was done, they would turn it all into a machine-readable format (punched cards in this case).

3. An ICL 1900 emulator would be written in Pascal

This is reported as being straightforward. Only 30 of the ICL 1900 instructions could be generated by the paper compiler, and so only that subset were emulated, together with 14 run-time support routines. This took them one-person month. I remind you, though, that no testing was possible at this time. Your code has to be close to essentially right first time. (H. Johnston is acknowledged as helping to write the emulator.)

4. The new compiler and emulator would be hand carried (by Welsh and Quinn) to ETH Zurich to be compiled and tested

Finally, we reach a point where a computer is involved! Just before that, though, Wirth reports in [2] that Welsh and Quinn had some trouble explaining to Swiss customs what punched cards were - and what they were worth. The matter was finally settled when customs were informed that the cards would be leaving Switzerland with them in a week’s time. (The fact that not all the holes in the cards were the same as on entry passed without mention on either part.)

The main reason for the slippage to eight days of work was the incompatible character encoding between the two machines and their operating systems. This took two days to resolve before work could really start. Welsh and Quinn, and later Wirth, don’t say how it was resolved. One of the main issues was that the designers of the CDC system, in their wisdom, had chosen an amazing convention for end-of-line/record (and I think similarly for end-of-file). If memory serves, on a CDC system two consecutive colon characters :: meant end-of-line, on the stated grounds that no one would write these characters normally! When the CDC card reader was reading the cards produced in Belfast it would often just stop as it had detected what it thought was end-of-record/file. If you ever wondered why Pascal’s text I/O is a bit weird, it was simply to hide this abomination from you.

5. The ICL 1900 emulator would be tested and debugged on the CDC 6000

This took just three compile/execute runs to fully debug.

6. The Pascal/1900 compiler would be tested and debugged on the CDC 6000 computer using the 1900 machine emulator

A few interesting statistics are reported in the paper [1]:

  • After some syntax errors were removed, eight compile/execute steps were needed for the new compiler to compile all the test programs and produce the expected results. Thirty-one errors were detected by this process and easily corrected.
  • In three more compile/execute runs, the new compiler could compile itself, generating a machine code binary file that could be plugged into the ICL 1900 emulator.
  • In three compile/execute/emulate runs, the new compiler, under emulation, successfully compiled all the test programs, generating identical code.
  • A single troublesome error remained: sometimes the compiler’s symbol table became corrupted. This proved hard to debug/fix under emulation and the decision was made to live with it and do the repair back in Belfast.

7. The Pascal/1900 compiler (binary and source) would be hand carried back to Belfast

The remaining error was quickly found back in Belfast and a patch applied to the binary to fix it. After compiling itself one more time, the patched binary could be abandoned and a working self-compiling Pascal/1900 compiler was available. I’ve always admired the wording Welsh and Quinn use in their paper [1], especially the second sentence.

“Thereafter the compiler ran correctly and compiled itself, generating a code file which was identical, word for word, with its own. No further errors have since been detected.”

Notes

  1. If you are wondering why something like an intermediate code was not used, well this is the project that demonstrated what a good idea such a thing would be for porting Pascal, and Wirth and his team then produced the Pascal P-code compiler that caused the porting effort to spread across the world (given that it wasn’t such an heroic task anymore).
  2. You might have further wondered – chicken and egg like – how the original Zurich compiler came to exist in Pascal. Well, in an earlier heroic effort, Wirth’s team (U. Ammann, E. Marmier and R. Schild) wrote the entire thing in Pascal on paper, and then one of them (Schild) locked himself away at home for a couple of weeks whilst he painstakingly hand compiled it into a lower-level language that was available on the CDC 6000 series.

References

[1] J. Welsh and C. Quinn, A Pascal Compiler for ICL 1900 Series Computers, Software–Practice and Experience, Vol. 2, 73–77 (1972).

[2] N. Wirth, Recollections about the development of Pascal, oberoncore.ru/_media/library/n._wirth_-_recollections_about_the_development_of_pascal_hopl_ii_.pdf.

[3] en.wikipedia.org/wiki/Cleanroom_software_engineering.

Posted by admin on 19 August 2013 - 10:00am

Submitted by John McGrann on 12 March 2015 - 5:41pm

Permalink

Great to see this article, I was an undergrad at QUB from 1973 to 1977 working almost exclusively in PASCAL. I completed a project and dissertation in the final year of my degree under the guidance of Jim Welsh, I wrote a bottom up syntax analyser for Pascal for the project so that he had a working example of why only Top Down syntax analysis would work. His complier construction course was a classic and his notes superb.

Add new comment

The content of this field is kept private and will not be shown publicly.
By submitting this form, you accept the Mollom privacy policy.