By Edward Fisher, University of Edinburgh, Raquel Alegre, University College London, Grace Cox, University of Liverpool, Edward Smith, Imperial College London, Violeta Holmes, University of Huddersfield, Catarina Martins, University of Manchester.
This post is part of the Collaborations Workshops 2017 speed blogging series.
A critical concept within software sustainability is the correct choice of a suitable software architecture that can be supported long-term and can be easily adapted given changes in the base requirements. But what do we mean by software architecture, what can we draw from the historical development of software since the birth of computer systems, and from personal experience and intuition?
A software architecture is a roadmap or blueprint for use during the development cycle. It is also a method of segregating work packages amongst multiple developers or the logical separation of tasks in a single developer’s sequential work pattern. As with building architecture, there are choices that can be made at the block diagram level that radically impact both the end functionality of the work and the length and difficulty of the path that takes the designer to that end. Again, as with building architecture, there can be an inherent artistry and beauty of optimum design within software architecture.
Getting the architecture wrong can leave projects into dead ends, stretch the timescales, and in some cases, promote the downfall of a project through unmanageable interactions between units, between developers or between memory structures. From a developer’s perspective, their personal development scope may creep, a significant cause of burn-out, as boundaries within a design are ill-defined and project managers have little idea as to whose responsibility a particular function is. Worse still staffing issues may present developers and managers with an only-one-person-to-do-this dilemma.
In his famous book of collected papers, David L. Parnas sets out different architectures, the classes of design and the advantages of modular design. Historically, software and indeed hardware architectures and hierarchical systems have employed graph theory, modular design, the concept of parameterization, data and control flow techniques all aiming at a structure that is the optimum from a functional view and optimum from a development view.
In this short speed blog, we aim to highlight that software architecture, your blueprint for development if you will, is as critical to design as correct version control or ensuring code is well commented. Our end message is one of education both from a bottom-up (developer-lead) and top-down (supervisor-lead) manner.
Together with other sustainable software concepts, we hope all our code can become sustainable for others and for our future selves. “Better software, better research”, the Institute’s go-to raison d’être, can now become “better architecture, better maintainability” and “better modularity, easier development”.
Software Architectures and Abstraction
A key concept within design in any field, whether that be building design, mechanical engineering, electronics engineering, software engineering or even management, economic or social systems engineering, is the concept of design abstraction (Herbert A. Simon, “The Sciences of the Artificial”). But what do we mean by design abstraction? In essence, it is the wilful ignorance of the nitty gritty details, allowing us to view the bigger picture.
In software development, we might separate a design into layers, perhaps the abstraction of services or functionality, perhaps the boundaries between sub-services or classes of functionality. Perhaps we abstract based on interfaces to both the real world or the end user of the software. Let’s call this concept functional abstraction, the classic case being an application with a GUI front end, where we may logically separate the functions and development of i) the GUI graphical code, ii) the handling of data processes and sub- or low-level functions and iii) event handling such as user mouse clicks on a GUI button or a hardware interrupt.
We can also separate a complex system based on the notion of an environment abstraction level. Hardware and the electrons that flow within transistor logic circuits are naturally the lowest level of an abstraction stack, but we can easily see that from hardware we can move up an abstraction stack of hardware, firmware, BIOS, low-level functions and right up to the application layer software and the end-user. One of the best examples of abstraction based on environment separation is the internet protocol Open Systems Interconnection (OSI) model, where we have the physical layer (hardware, wires, modulation types, encoding), the data layer (data framing, error correction coding), the network layer (packet addressing and routing) and several other layers abstracting different environments for code. Right at the top of this stack we again have the application layer, allowing the user to be entirely blind to the nitty gritty of voltage levels of bit transmission at the physical layer.
The Academic Problem
So, it seems that software architecture is a useful technique, building on both abstraction layers in other engineering domains and the history of computers and software itself. So why are such concepts difficult to implement within academia? Why do we still see code of 1000+ lines within a single file? Why do we still see high interdependence and version control issues within code developed within academia?
There are three issues within academia that separate it, from a code development perspective, from industry. The first is that as academia quite rightly focuses on the results, rather than the path, or the longevity of that path, there is a certain attitude of “just get it working” within academia. This is principally historical, where the methods may be discussed in a paper, but the headlines are always the results and a few paragraphs within the method section may be swamped by the mathematical algorithms, signal processing and experimental method rather than the specifics of a project’s software code.
The second issue is one of project and indeed code longevity. In industry, new products may rest on development and sub-units of previous products. While in academia, projects may generally follow on from each other, they often stand upon the results and concepts from a previous project rather than implementation of code. When this is coupled with typical four- or five-year research grants, often three- to four-year PhD projects, and often short one- to two-year postdoc contracts, it becomes difficult to ensure project and personal work effort longevity.
The third issue is one of team size. In industry, the larger the work package and the more invested in a leading product, typically the larger the development team. As the lifetime of a product can be decades, team sizes can range from 10s to 100s of developers. While ideally at the grant writing stage academia has a similar mechanism, the nature of research often means work packages grow during the lifetime of a project, while staffing budgets are pre-allocated costs with little ability to transfer money from equipment into staff etc. The result is that a project may budget for one full-time postdoc and one PhD student, but the work package may quickly grow too large for a small team size. Likewise, the student’s project needs to be of limited scope amenable for a coherent write-up, shunting any growth of a work package onto the shoulders of the postdoc. Software development can therefore quickly become a solitary exercise.
Within academia, the pressure to get results often restricts both PhD students and research staff to being self-taught with respect to software development or engineering best practices. It does help if a staff member’s background used the same programming language, concepts or if they have industrial experience. However, this may not be the case, as highlighted by the variety of disciplines within the Software Sustainability Institute’s Fellowship Programme.
To add fire to thish situation, academia in entire contrast to industry has a continuous professional development (CPD) issue. In industry, if a new skill is required to complete a work package, a staff member may easily go to a suitable training course. But in academia, there is an unwritten rule that informal training and osmosis from research papers is adequate. Further—while university’s have in-house training, and may indeed pay for small, cheap CPD courses—if there is no dedicated funding written into a grant proposal, then postdocs and PhD students struggle to obtain cutting-edge advanced training. As an example, Oxford university offers a digital signal processing (DSP) course, at £2,500 with a likely total cost of £3,000. It is beyond the scope of an individual’s finances and is beyond the scope of small bits and pieces within a research grant. Likewise, funding may not be available from an institution. External funding may be available, however why should professional development, and the ability to perform world-class novel research become a numbers game with highly competitive funding routes?
To summarise then, the major issues are the academic "just get it working" attitude, project and code longevity, small or indeed unary team sizes and the issues regarding continued professional development, training and self-tuition.
Solutions and Conclusions
Luckily, there are some solutions to these issues, although most are highly sensitive to the will, opinion and funding of a research group’s principal investigator. They are also highly sensitive to institution infrastructure and a critical mass of like-minded individuals, although they are also highly influenced by the will of software development staff.
As many research groups have some form of legacy project active when a new member starts, one solution would be a research group technical introduction. Here a new PhD student could be shown the location of a group’s sub-version repository, could become accustomed to various tools used within the group and would obtain stylistic elements through osmosis. Shared code would add to this introductory ethic, although more formal training may be worthwhile, particularly within the first year of a new role. Companies sometimes utilise style guides for new staff members, coupled with of course formal and informal CPD. This idea of training prior to development is particularly prevalent in the American PhD model.
A primary solution, closely linked to the ideas of abstraction and architecture, would be to instil a group or institute-wide work ethic of drawing an architecture prior to the start of development. Taking a top-down approach to the issue, supervisors would need to implement periodic design reviews, similar to industry. Here, the separate modules could be discussed prior to code development, libraries and code shared by other staff could also be incorporated allowing a minimisation of “reinventing the wheel” and a maximisation of research output. Certainly, such a shared code, shared library and perhaps shared modular design work-ethic promoted by a group’s leader would instil a robust team effort attitude, thereby breaking the issues with solitary code development.
Groups such as the Software Sustainability Institute and training providers such as Software Carpentry, aim to allow academic staff to brush up on worthwhile software development skills. A route towards increased code sustainability would be increased visibility and advertisement of such training courses. If we again concentrate on students or staff within the first year of their role, pressures on supervisors can be alleviated by both widening the scope of Software Carpentry courses, and providing supervisors with updated lists of both formal and informal training services. Such resources would be highly amenable to institutional level administration where multiple students from multiple research groups and institutions can be provided with training information. Such top-down approaches allow a level of circumvention when academic to academic discussions become infrequent or strained due to the natural separation of research interests.
An immediately effective solution would be the checklist format development aids and tools designed to check or improve software. A software “Linter” can be used to flag up suspicious and non-portable code (bugs) and Tuning and Analysis Utilities (Tau) by the University of Oregon can be used to increase performance of parallel software designs. Some languages such as Verilog and VHDL are amenable to checklist development as the design must pass various stages such as syntax checking, synthesis, functional simulation, translation into a netlist, mapping to hardware and placement and routeing onto that hardware. Likewise, Python and Java include the concept of unit testing, allowing the steps above to form a software version of the robust signoff checks used within the microelectronics industry.
To conclude, academia often has problems maintaining software architectures and development best practices, but there are some solutions. Some can be self-taught, others require a team spirit to be inspired and others require institutions to invest in continuous professional development. The role of the software architecture can make or break the development cycle, and getting it right can allow just the right mixture of solutions to both obtain better software and better results—a goal we can all agree on.