By Ian Cottam, IT Services Research Lead, The University of Manchester.
This is the second in a series of blog posts on my heroes of software engineering. The first post in the series focused on Welsh and Quinn.
#2 Mcillroy and Thompson
I estimate that at least once a (working) day for the last 35 years I have typed into a terminal window a (UNIX) pipeline. Today, for example, on my Mac I did this:
./listusersdropoff <tttt |
grep -v manchester.ac.uk |
grep -v mbs.ac.uk | wc -l
Which tells me how many external to Manchester people are using our file DropOff service. (If you leave off the last stage in the pipeline, it gives me the list of external users.) Interestingly, my shell script (listusersdropoff) is pretty much just another pipeline that finds email addresses in our logs, and is a good illustration of something Brian Randell once said "Good ideas are made great when applied recursively" (or words to that affect).
My heroes today are Doug Mcillroy and Ken Thompson. The latter is known to all as the main inventor and implementor of the UNIX operating system (amongst many other software engineering things of note). Thompson's name is often quoted alongside the late Dennis Ritchie's - another hero, but not of today's tale. Today his partner in heroics is Doug Mcillroy, who was actually Thompson's manager and boss at Bell Labs. (As someone who sometimes wears a manager's hat himself, I'm pleased to see they too can still be software engineering heroes.)
You may be surprised to know that there once was a time that UNIX did not support pipes. If was before my time (which, for me, was mid 70s with Version 6 UNIX), and may even have been before Bell Labs let UNIX escape to the world's universities and research labs. The evangelist for introducing pipeline programming and its designer and inventor was Doug Mcillroy. The man who made it all work, so efficiently and beautifully, was Ken Thompson.
To those few software engineers who had studied dataflow or functional programming the magic and utility of pipeline programming (aka software componentry or software tools or filters) was obvious. To mathematicians it was function composition brought to life. Mathematicians use a higher-order operator, often called fat-dot, to compose functions. For example, given two functions:
f: A → B g: B → C
g ⋅ f
is a function from A → C (their composition) that first applies f and then g to f's result. Later, when some mathematicians were also software engineers, fat-semicolon was introduced, in the obvious way, so you could compose the two functions, like so
f ; g
which might be more natural to many of us left-to-right readers.
Now, under any UNIX-like operating system all files have the same single type: Byte* i.e. a finite sequence of zero or more ordered bytes. Hence, in one sense at least, you don't need to worry if the inputs and outputs from two programs are the right type to compose. Of course, how those byte sequences are interpreted is up to the programs concerned.
Mcillroy was educated as both an engineer (first degree) and mathematician (graduate degree), and so it's not clear to me what order his design for pipelining followed. That is, did he want an implementation of function composition, with programs as the functions, or did he see an engineering need and used the elegance a mathematician is taught to bring it to life? History seems to think it is the latter he spotted numerous scripts where the output of one stage was the input for the next but, as he is still working (in his early 80s), perhaps he can tell us, as one of my hopes of this blog series is some recollections from the people involved.
Once Thomson was persuaded, and I think history says he took a little persuading, it did not take a man of his software engineering skills long to implement. One could have cheated and make each pipe connection a shorthand notation for creating a temporary file that was the output of the left hand side of the pipe and the input for the second. (In fact, when Microsoft copied the idea in MSDOS that is exactly how they did it.) For Thompson, and I think in just a day or two, he had the most beautiful implementation, where all the programs ran as full independent processes in parallel (as far as the scheduler was concerned) and, behind the scenes, in-store buffer queues where used to transfer the data between pipeline stages, with automatic blocking by the UNIX scheduler whenever a stage had to wait for input or an output buffer queue filled up. Decades later, when we all have multicore machines, pipeline stages can truly execute in parallel, without any change in notation for the user and even no changes to the UNIX scheduler other than in general it has to know it has multiple cores that it can attach processes too (there being nothing special about pipelines). That is heroic engineering that is sustainable.
Many people would have stopped there, but not Thomson. He then pulled a night shift so that he could modify every single one of the UNIX tools to (optionally) act as a pipeline filter by reading standard input and writing standard output. When all his colleagues came in the next day they had a festival of pipeline creation: a party I would have liked to have been at. I think this was also the time that the standard error channel was invented, as, clearly, you didn't want your error messages disappearing down the pipe. I expect Thompson modified all the UNIX programs to support stderr too on that heroic night.
Given how much time people spend today on user-interface issues, you may be surprised to know that the vertical bar symbol was not the first choice for the pipe symbol. I think ^ and > were both used/debated, and certainly I remember for a long time on UNIX-based systems that you could use the exclamation mark !. Nowadays, the vertical bar is so synonymous with pipes that some programmers pronounce it as pipe-symbol. On old keyboards from the 70s the | symbol even had a little gap in it, which I thought was wonderful as you could imagine the data streaming through the gap in the pipe.
1. As I mentioned, at 81 Doug Mcillroy is still working, and so is Ken Thompson (at 70) on the Go language at Google.
2. In the 70s, I was in particular awe of two UNIX programs, one designed and written by Mcillroy and one by Thompson (of many of course for both of them). You might think for Mcillroy I am talking about diff, which was an amazing achievement, but I preferred his original spell (checking) program for the 16-bit PDP-11. Why? Because it was a good spell checker that worked without actually storing a dictionary. If you don't know how that could be, I'll leave you to study the history books (well, via Google of course).
In the case of Thompson, it was his line editor ed. Ed contained, and likely still contains, much of what a software engineer needs to know. From trusting how well the underlying UNIX system could handle moving around any text file in a random order to the pioneering use of compiling (pattern matching) regular expressions into optimal code for a finite state machine. But, my all time favourite bit of engineering, which Thompson generously says was already folklore, was how to implement a block move of a sequence of text lines to another part of the file. For a forward block move, you simply reverse all the lines you are moving, and then you reverse all the lines from its end point to where you are moving to, and finally you reverse all of the above. When you think that text lines are stored as simple pointers to where the text line really is, such an implementation, which uses no extra space, is amazingly efficient as well as an example of beauty in engineering.