What We Talk About When We Talk About Modularity

ruby architecture

Damian C. Rossney

No dogma, just what I'm learning and thinking about right now.
Comments and feedback are welcome on Mastodon.

"If you're thinking without writing, then you just think you're thinking."
—Leslie Lamport

This is the second of three articles discussing modularity patterns in Ruby (with apologies to Raymond Carver).

What is the problem that modularity is trying to solve?
What is modularity and how does it help solve the problem?
What exactly are modules and how do we use them?

Recap

In the previous article I argued that increasing complexity is inevitable in most systems for three reasons:

Interaction with the real world drives the evolution–and growth–of the codebase.
The number of possible connections between objects in the system grows exponentially.
The burden of maintaining shared mental models within the team grows with the system.

In this article we will look at the different strategies used to combat complexity over the history of software engineering–including what worked and what didn’t work.

Nothing New Under the Sun

The problem of creeping complexity is not unique to Rails or Ruby, nor is it a new problem. In fact, it’s surprising how long this topic has been debated–at least it was to me. Fred Brooks addressed the matter in his classic essay collection, The Mythical Man-Month. This book, first published in 1975, was primarily based on his experience managing the OS/360 development project at IBM between 1964 and 1966. Despite the book’s sometimes head-scratching takes and rampant Mad Men-style misogyny, it contains important insights still very much relevant today.

Brooks describes the complexity problem in terms of communication and coordination–very much in the spirit of the “shared mental model” discussion in the previous article. His prescription for this problem: cultivate perfect communication. Brooks argued that the teams must be encouraged to communicate with each other in as many ways as possible, from informal phone calls (the Slack/email of the day) to regular formal meetings (presumably all-hands). But the most important vehicle of communication would be the “project workbook”:

All the documents of the project need to be part of this structure. This includes objectives, external specifications, interface specifications, technical standards, internal specifications, and administrative memoranda.

(from “Why Did The Tower of Babel Fail?,” The Mythical Man-Month)

This project workbook would be a feat of administrative engineering unto itself. It would need to be updated daily in order to “ensure that relevant information gets to all people who need it.” To accomplish this (remember, this was 1964), they utilized a “computer-driven text-editing system” (fancy!) and then used offset printers to issue paper copies to every programmer (there were about 1,000 programmers eventually). The workbooks were in loose-leaf binders, and the programmers were all expected to swap out the changes daily–and to read them.

Let that sink in: every programmer, every day, read the changes, file the changes. Remember the shared mental model we discussed in the previous article? Brooks wanted every programmer to share the same mental model of the entire system. Updated daily. 😃

The magnitude of this effort was apparent only six months into the project, when Brooks noted that the workbook had grown to five feet thick. Faced with this obvious absurdity, he did the only thing a sensible IBM manager could do: he switched from paper to microfiche and kept piling it right on (“saving” a million dollars in the process, he noted).

At that point in the project, daily changes alone numbered around 150 pages. Summaries of the daily changes were now written, and programmers were expected to read the summaries every day, and to read the changes themselves if relevant to their own work.

The programmer would probably read [the change summaries] daily, but if [they] missed a day [they] need only read longer the next day. As [they] read the change summary, [they] could interrupt to consult the changed text itself.

Amazingly, even as he wrote The Mythical Man-Month some ten years removed from the OS/360 project, Brooks still believed that his formula for perfect communication was the only reasonable approach–his only improvement would be to use “display terminals” in place of paper or microfiche. He was aware of other ideas, but rejected them as unworkable.

D. L. Parnas of Carnegie-Mellon University has proposed a still more radical solution. His thesis is that the programmer is most effective if shielded from, rather than exposed to the details of construction of system parts other than [their] own. This presupposes that all interfaces are completely and precisely defined. While that definitely is sound design, dependence upon its perfect accomplishment is a recipe for disaster. (emphasis added)

Luckily, my copy of The Mythical Man-Month is the Anniversary Edition, released in 1995. Given 20 more years to think about things, Brooks had reconsidered some of his earlier positions. The Anniversary Edition includes a new chapter discussing which of his ideas had stood the test of time and which had not. In this chapter, “The Mythical Man-Month after 20 Years,” he states:

Parnas was right, and I was wrong. I am now convinced that information hiding, today often embodied in object-oriented programming, is the only way of raising the level of software design.

Interesting. Let’s see what this information hiding thing is all about.

Information Hiding

Writing in 1971, David L. Parnas was well steeped in the “perfect” documentation approach that Brooks described, but he had his doubts.¹ His first criticism was that excessive design documentation stifled the ability of programmers to explore solutions that did not fit the up-front design. This point should sound familiar to all of us living in a post-Agile Manifesto world.

Beyond this point he seemed to struggle for the words to describe his position, so he resorted to an allegory (of sorts) of the good programmer vs. the bad programmer. “Good” programmers know the whole system and “efficiently” exploit useful subroutines from other modules that can make their present job easier. “Poor” programmers, on the other hand, “are not clever enough to notice” such opportunities.

However, Parnas noted, these interconnections created by “good” programmers do not benefit the system as a whole: they “increase the connectivity of the structure without appreciably improving its performance.” In fact, the proliferation of connections between modules decreases the extensibility and maintainability of the system: “We have found that a programmer can disastrously increase the connectivity of the system structure by using information [they possess] about other modules.”

Parnas was not ready to fire all the good programmers to correct the problem. Instead he proposed a novel approach that he would later call “information hiding”: “We should not expect a programmer to decide not to use a piece of information, rather [they] should not possess information that [they] should not use.” Parnas called these hidden internal implementation details “secrets.”

So What is Parnas Talking About?

Let’s take a step back and look at the environment in which Parnas and Brooks were operating. Both were working on so-called “large applications,” for lack of a better term. These were not projects that started small and grew large, they were projects with hundreds or even thousands of programmers from inception. To manage these projects, the then-popular strategy was to split the system into “modules.” The idea was that modules would allow teams to work in parallel, would permit flexibility in development (modules could be refactored independently), and would aid in comprehensibility (smaller building blocks are easier to understand).²

To be clear, the term “module” at this point meant little more than a “piece of the system,” as envisioned by the system architects. Since no further guidance seemed to be given around how to develop individual modules, programmers were free to create connections between them whenever and wherever they saw fit. As these connections multiplied, it’s no surprise that Brooks and his contemporaries sought to tame the growing chaos through “perfect” documentation.

Parnas’ insight was that this proliferation of interconnections itself was the cause of the cumulative, ever-increasing friction in these projects. Making them harder and harder to change over time, and consistently driving projects over time and over budget. His solution was to cut all but the most essential connections, to define the remaining connections around a stable “interface,” and to hide all internal implementation “secrets” from other modules.

We can avoid many of the problems discussed here by rejecting the notion that design information should be accessible to everyone. . . . We can avoid . . . conflict by designing the external interface, using it as a check on the remaining work, but hiding the details that we think likely to change from those who should not use them.¹

Thus, the “module” became more than just a convenient term for a “piece” of the system. It became an actual boundary object. This module-as-boundary-object is the concept we are interested in here.

What’s the Verdict?

I think it’s clear that history has come down in favor of Parnas’ intuition that “information hiding” and defined interfaces, applied together, are the most effective strategy for taming the proliferation of interconnections within complex systems. Indeed, we already know that Brooks himself conceded as much. But is this really enough? Are “secrets” and interfaces really all we need?

When Parnas began to implement projects using the principle of information hiding, he noticed right away that duplicate functionality was present in different modules. This was, in accordance with his preference for “poor programmers,” to be expected: some duplication was the price of battling complexity. Interestingly, however, when Parnas began to explore the practice of composing modules so as to minimize the need for interconnections and duplicate functionality, he found that his compositions tended to look very different from those designed by the system architects: the modules seemed to be composed more by function than by theoretical role in the system.[^2]

This raises a pointed question: what if the proliferation of interconnections Parnas sought to avoid were always just a symptom of a greater problem: poor module design? What if Parnas’ true breakthrough was not just the tactic of information hiding, but a better principle for decomposing systems into modules?

What name should we use for this new design principle Parnas had discovered? I believe it is best described as autonomy. In fact, if we shift our perspective to focus on autonomy, we see that information hiding and defined interfaces are actually derived properties of autonomy: true autonomy requires that modules must be free to refactor their internals at will, and must only commit to limited, slow-to-change public interfaces.

In the final article in this series we will discuss further, practical implications derived from the composition of modules based on the guiding principle of autonomy.

Thank you for reading! Discussion, feedback, and corrections are welcome on Mastodon.

Parnas, DL. Information Distribution Aspects of Design Methodology, Carnegie-Mellon University, February, 1971. ↩ ↩²
Parnas, DL. On the Criteria To Be Used in Decomposing Systems Into Modules, Carnegie-Mellon University, August, 1971. ↩