You are on page 1of 9

The VMS approach to Software

Engineering
Writing great code that survives hypes, cutbacks and takeovers

Camiel Vanderhoeven camiel@camicom.com


www.camicom.com
Contents
Author’s Note .......................................................................................................................................... 2
Introduction............................................................................................................................................. 3
The Team ................................................................................................................................................. 4
Start with a small team of grownups .................................................................................................. 4
Grow the team .................................................................................................................................... 4
Take ownership ................................................................................................................................... 4
Value team culture .............................................................................................................................. 4
Look beyond the team ........................................................................................................................ 4
The Design ............................................................................................................................................... 5
Start with a design ............................................................................................................................... 5
Keep it simple ...................................................................................................................................... 5
Make it modular .................................................................................................................................. 5
Maintain the design............................................................................................................................. 5
Testing ..................................................................................................................................................... 6
Create meaningful tests ...................................................................................................................... 6
Continuous regression testing............................................................................................................. 6
Employ in-use testing .......................................................................................................................... 7
Procedures............................................................................................................................................... 7
Maintain a proper workflow ............................................................................................................... 7
Err on the side of caution .................................................................................................................... 8
Perform peer reviews .......................................................................................................................... 8
Manage your code streams ................................................................................................................. 8
Keep to high standards ........................................................................................................................ 9
Fix your bugs........................................................................................................................................ 9
Conclusions.............................................................................................................................................. 9
Acknowledgements ................................................................................................................................. 9

Author’s Note
This article, though never published, was written well before the deal between HP and VMS
Software, Inc. was announced. Please keep that in mind if its conclusion seems a bit gloomy to you.

Netterden, August 7th 2014


Introduction
I've been brooding over the subject of this article for a while now. I delivered a talk on it at the 2010
VMS advanced technical boot camp, but before that, I believe I can trace its origins back to the
following event.

At a tradeshow a few years ago, I was approached by a gentleman I shall for obvious reasons
henceforth refer to as "the Beard". The Beard had been informed that I dabbled in code writing a
little, and after the barest socially necessary showing an interest in what I did for a living, he
proceeded to tell me about the new software development methods his company was developing.
This was one company that was taking the hype of cloud-computing to a whole new ethereal level!
Their software model consisted of loose functions written in many different languages executing on
many different systems and architectures.

The Beard introduced me to his boss, whom I shall refer to as "the Professor", because the fact that
he was one was the first thing I was told about him. This Professor explained that the beauty of their
scheme was that the interfaces between various functions were more or less loosely defined.
Requests would be made (of the cloud) to provide a function that would do such-and-so, and each
system connected to the cloud could provide an answer. This was presented as a huge advantage in
development agility and ease of management. Finally, no more worries about component versions
and interface definitions!

My mind was sufficiently boggled at the time that the only thing I could think of asking was, "well,
but how do you design such a system, and how do you debug it?" The brilliant answer being, "That's
the beauty of it! You don't need to spend time designing your software anymore. It will design itself!
You only have to test the individual functions, and you don't need any more debugging than that!" At
this point, I got worried and explained a little bit about the kind of projects I had been involved in1,
the mission critical nature of these, the fact that sometimes people's lives or the fate of a nation
depended on large systems, and how I spent most of my time intricately crafting and designing things
before setting out to code any of it. To no avail. "Great, so that means you can really see the
potential of this! Imagine the time you could save if you didn't have to do this." This was my cue to
start looking for a polite way to end the conversation and find the nearest exit.

This is, of course, an extreme example, almost to the point of being ridiculous, but I do believe it to
be symptomatic of the current emphasis on cheap, fast software development. So, as a contrast, and
hopefully as an eye-opener to those developers who have never had the uplifting experience of
being offered a glance at a truly magnificently engineered piece of code, I decided to write an article
on the software engineering practices employed by the engineers who wrote VMS.

Where practical (the scale of the efforts and the size of the development teams I’m part of are
usually much smaller), I try to work and code in accordance with these practices.

Some of these practices come straight from the Digital Software Engineering Manual; many others
were developed by the VMS engineering team. Please read this article as a tribute to those
engineers, many of whom have very kindly offered suggestions for this article.

1
I have worked on the design of systems in such diverse fields as healthcare, meteorology, nuclear safeguards
and government inspection processes. Many of these systems are of a mission critical nature for financial
reasons, some for safety reasons.

3
The Team
Start with a small team of grownups
You should start out with a small team of people who know what they’re doing and don’t need a lot
of hand-holding. The VMS team started with just 3 people working out the initial architecture, and
the entire VMS V1 team consisted of 24 people.

Experienced developers will be able to avoid many subtle yet costly mistakes. A smaller team also
means less time spent on interpersonal communication2 and overhead.

Grow the team


As the team grows, it is good practice to have the old-timers serve as mentors and ‘design
documentation’ to the new engineers. Often, the paper documentation is incomplete; even the best
documentation occasionally misses a small spot. Even documentation that is complete never
provides the complete background information required to understand all the decisions that were
made. For this reason the senior engineers will always be the ultimate fallback for design questions.
Having engineers on in the team who can recall why the code turned out the way it did, and why
other solutions were rejected, is invaluable because it helps rule out more dead ends when faced
with similar issues (“I don’t think that approach will work, because we tried it before in a very similar
bit of code.”)

The mentor-apprentice relationship is not only meant to transmit specific design information to new
engineers, but also to instill into them the overall engineering culture of creating correct and reliable
code by understanding it.

Take ownership
Each bit of code in a project should have an owner. The notion of individual ownership provides
control and support by the person who understood the code the best. It also provides accountability
by holding people responsible for solving problems in code they have produced.

Value team culture


Engineering culture can make or break a project. Engineers should see their work as more than “just
a job”. Be proud of what you do, and treat the components you own as your babies. Know who the
experts are in all areas, and respect them for that. Keep short, informal lines; if you know that person
A is the best person to answer your question3, send him an email, or just walk over to his cubicle and
ask. In a good team that works this way the engineers feel they are part of a united team that feels
almost like family. Keep up morale to keep down staff turnover and help with day-to-day motivation.

Look beyond the team


You should foster a strong community with your users. Consider your users to be part of your
extended team. Being in direct contact with the ultimate users of your software can be awkward, but
in the end it will prove to be very valuable. It will teach you what is actually important to the people
who buy your software, rather than what seems important to you. Listen to users’ feedback and
factor that into your decisions.

2
Don’t get me wrong here; interpersonal communications is of vital importance to a team; precisely for this
reason it becomes increasingly difficult to maintain coherency within a team as it increases in size.
3
Make sure that each team member is aware of who the experts are in each area.

4
Customers depend on your work. It is very satisfying to know that a hospital, bank, stock exchange,
nuclear facility or wafer lab is relying on the quality of your efforts to accomplish their goals. On the
other hand it is humbling and scary to also know that when a problem is discovered, you are
potentially responsible for some serious business consequences, or worse.

The Design
Start with a design
Quality cannot be added later. Your design should be well thought through from day one. Central
features (such as security and auditing) should be built into the design, not added in version 2 as an
afterthought.

A good design process works “top down”. Start with requirements, then a functional specification,
then a high level design, then work down into the details. A good well-structured design meets the
requirements and anticipates future requirements.

Rapid prototyping methods were occasionally used in VMS, but only after there was a high level
design. After the high level module design was in place, “quick and dirty” implementations of
selected components would be written that performed their basic function but lacked required
features, refinements, or performance. This allowed a functioning framework to be set up quickly, in
which multiple engineers could then replace the breadboard components with the real ones as they
were developed.

Keep it simple
Many software projects have failed because the design team allowed the project to become too
complex. A small team and schedule pressure forces the developers to focus on the essentials and
not digress into “nice to have” features. At the same time, the team must maintain the discipline to
not take short cuts in the design process.

Make it modular
Modularity is the inevitable result of a well thought out design. It allows components to be changed
or replaced without affecting the rest of the system.

Modularity allows the overall design to be factored into understandable components. When you're
dealing with something as complex as an operating system, the only way to make it reliable is to
build it so it can be understood. You cannot test your way to reliability with any non-trivial piece of
software.

Hardware support – CPU’s, systems and I/O devices – is one of the big modularity success stories in
VMS. The results speak for themselves. The rest of the system is equally modular, and many major
components were substantially enhanced or replaced outright during the life of the system.

Maintain the design


If you consider the design process to be a one-time trick, you’re likely to get stranded in code rot at
some point in time. Common causes of code rot are a lack of understanding of the design, functional
changes made without updating the original design, quick and dirty code changes, changes that go
against the design, and duplication of functions. If allowed to go unchecked, code rot will seriously

5
impact your ability to maintain your product and put out new releases. Problems that would have
been easy to pinpoint if the design was followed properly now become a search in a maze of
spaghetti-code.

Documenting the design is the key to being able to maintain it. Even more important than
documenting how something works is to document why it is constructed the way it is. It is always
possible to reverse-engineer the how from the code (although it may be tedious). But discovering the
why after the fact is often impossible, and without it, it is often not possible to really understand a
piece of code. All code should be documented reasonably well inside. Again, focus on why the code is
done the way it is.

Interfaces between major modules need to be well defined, and stable. If you feel you need to
change one of these interfaces, think carefully. Your change may break the interface for lots of other
modules that depend on it. Carefully check the design documentation to see if you’ve missed
another way get what you need out of the interface. See if there’s a way you can do without it. If you
still need to change the interface, change the design first and run it by those responsible for the
other modules using this interface.4

A very good idea is to validate inputs passed across interfaces. If you type your internal data
structures, and have your interface type-check all structures passed to it, you can catch many errors
early on.5 One of the components in VMS that does this is the file system. All structures handed to it
are type-checked. As a result, the file system detected many of the pool corruptors in the early life of
the system.

Despite one's best efforts, system components sometimes wear out due to code rot or just
significant changes in scale or requirements. Maintaining a modular design will allow you to rewrite
components completely without major impact on other areas of the system. Several components in
VMS were rewritten in this way, among them are scheduling (rewritten a few times) and memory
management.

Testing
Create meaningful tests
As I said before, you cannot test your way to reliability. That doesn’t mean you shouldn’t test! Test a
lot, and make sure you don’t hit just the mainline code, but especially error paths. Create errors on
purpose to test them. As an example, for VMS clusters, there are test mechanisms that introduce
faults in the cluster protocol layers, and these are exercised while the cluster is operating under
heavy load.

Continuous regression testing


Build your product from scratch on a regular basis, and run automated tests against these builds. Nip
regressions in the bud.

4
I’m making the assumption here that your documentation keeps track of what modules use what interfaces. It
does, doesn’t it?
5
Strongly typed programming languages, like C++, Ruby, or Haskell can help prevent many of these kinds of
problems at compile or run-time, but those benefits fly out the window when mixed with weakly typed (like
Basic) or untyped ( like assembly) languages.

6
VMS (the complete operating system, utilities, associated products, etc.) is built from scratch weekly.
The Quality Test and Verification group installs every week's builds of the operating system and runs
it on a large number of servers. Those servers are used to perform specific regression tests as well as
to accomplish the day-to-day work of the many people (email, notes, setting up test scripts, ad-hoc
testing, etc.).

Employ in-use testing


The term in-use testing conveys that the testing is done almost automatically as the engineers do
their normal daily tasks. It is a good way to catch rare problems; things which might not occur when
performing a targeted test ten times6, but which will eventually happen if you do something
hundreds or thousands of times without really thinking about it.

You may need to get creative with this. If you’re developing an operating system or development
environment it’s relatively easy to use them to do your daily work. If you’re developing business
software, use it to run your business. If you’re developing an airline reservation system, and you’re
not an airline, you might build an adapted version of your software for meeting room reservations.
Try to use your own software on a daily basis.7

Procedures
Maintain a proper workflow
A major pitfall in software development is to take shortcuts, especially when the stakes are high and
customers are clamoring at the gates for a solution.

However, the bigger the change, the more important it is to follow the proper steps. A typical,
rigorous workflow for changes such as used by VMS engineering would look like this:

 Detailed problem statement;


 Design specification, where possible with multiple solutions;
 Design review, detailed discussions about the advantages and disadvantages of each of the
proposed solutions. The collective effort of the design review may result in an even better
design than those proposed;
 Prototyping, to verify if reality matches up with the design. A result of this are test cases to
stress test the design;
 Another meeting, to discuss the prototyping results;
 Implementation of the solution, and integration it into the current development stream;
 Code review, for large changes, there are formal code review meetings;
 Code check in, under control of the build managers;
 Incorporation of unit tests into standard set of tests to run;

6
Targeted tests are often written to test how the programmer expects the software to be used. Deviations
from the programmer’s intended use are hard to take into account in deliberate testing, but occur naturally in
every-day use.
7
Don’t stretch this too far, of course. If you’re developing control software for nuclear power plants, it would
be silly to adapt it to run your coffee machine. However, if you’ve developed an underlying framework suitable
for a variety of measurement and control applications, you might consider using the framework for building
automation tasks in your office. You’ll know you’ve made a mistake if it suddenly starts freezing after you’ve
checked in your code.

7
 Documentation written by documentation team if required by the change;
 Documentation review before release goes out.

Quick and dirty fixes for a specific customer are fine, but should be temporary. For general
distribution and inclusion in the main code stream, a long-term fix should be developed afterwards
that follows established procedures. Sticking to these procedures slows down the development
process in the short run but leads to a higher standard of quality, which will pay itself back in the long
term.

Err on the side of caution


Take your time for design decisions. When needed, don’t hesitate to create committees for them;
this is done to get a wider audience and to get input from many areas. One example in VMS was the
calling standard committee that consisted of folks from compilers, exception handling, debugger,
kernel tools, runtime libraries etc. For the IPF calling standard, this group had bi-weekly meetings for
a year or so, to make sure they got the implementation of the Intel calling standard on VMS right.

Perform peer reviews


It is good practice to perform peer review at all levels – specification, design, and code. A review
proves that the designer actually understands the problem and the design, and allows others to
contribute ideas to the solution. The earlier bugs are found in the design and coding process, the
easier they are to find and the faster and cheaper they are to fix.

All significant functional changes and additions are discussed at length between two or more
developers. The goal is to identify design weaknesses sooner rather than later, find simpler or faster
ways to meet the requirements, and spread the technical knowledge to multiple people.

Most code changes are reviewed by at least one engineer other than the author of the change. All
sorts of potential problems are caught by code reviews, from significant (though often subtle) design
or implementation errors to trivial mistakes that were not caught in testing. For large changes,
formal code review meetings are called. Performing code reviews slows things down, no doubt about
that, but in the long run it is important.

A final code review is often performed using the documentation that the engineer doing the check-in
has prepared, which allows another engineer to verify that the code being checked in matches
expectations and that the documentation is sensible.

Manage your code streams


Depending on the nature of your project, you may have many streams of code available at any given
moment. In VMS engineering, these usually number between 6 and 8, representing the in-
development release of VMS plus a few past releases. In some cases the code in a given source
module is identical across a number of streams; in other cases there are unique variants in different
streams. The possibility of errors when performing check-outs and check-ins is of course real, but it
can be minimized by having a standard set of tools and operating procedures which all engineers
who perform check-ins follow.

A "release manager" has authority to approve or reject proposed changes, based on schedule
factors, levels of technical risk, etc.

8
A team of "builders" are in ultimate control of the master source code streams. All check-ins follow a
two-stage process: the engineer performing the check-in queues the request to have the check-in be
done. Then a member of the build team reviews the queued check-in requests, and releases the
check-ins from the queue to actually be performed, thus modifying the master source code
database8. A build team member might delay finalizing a check-in until some known issue elsewhere
in the system is resolved (to avoid adding confusion), or might even reject a check-in if the
preparation is incomplete or sufficiently non-standard.

Keep to high standards


 For new features, never break old functionality.
 Strong focus on quality. If it doesn’t pass your quality criteria, don't ship the release.
 If you say you support something, your policy should be that you test it.

Fix your bugs


Take your problem tracking system serious. Organize scrubbing meetings all engineers should attend
whenever a major release is due.

It's always more fun to create something new than to maintain the existing stuff. Be prepared to stop
forward development in favor of fixing bugs.

Design the software such that if it fails, you might be able to capture enough details to diagnose the
failure. The system crash dump and the process dump features are excellent examples. There is also
logging code sprinkled through a number of VMS components, which in some cases can be activated
by the customer to gather additional information9.

Conclusions
A lot of the recommendations in this article can be summed up as “maintain a high standard of work
at all times,” and a lot of that can be traced back to maintaining a proper engineering culture. A
culture like that does not grow overnight, it must be cultivated. It must also be maintained, because
although a well-functioning team strengthens itself and builds up a remarkable resilience, too many
poor management decisions may eventually lead to even the strongest team breaking down.

Acknowledgements
A special thanks to those who provided me with their thoughts on engineering. Lots of your
contributions made it into this article almost verbatim, my work was mostly editorial in nature. My
apologies for keeping you waiting for the article for so long.

8
Depending on the source code repository tool you use, this can be achieved in a number of different ways.
The use of entirely separate developer and master repositories may be called for. I personally find Mercurial,
which has separate repositories by design, well suitable for this type of workflow.
9
e.g., the various SDA extensions such as EXC and FLT

You might also like