Management Pseudo-Science
Not everything that looks like science is science
A long time ago, someone posed a question to the development team I
was working on: “How many lines of code are in this application?”.
Some external stakeholder needed to know the number for some
unspecified reason. No one knew the answer, but one developer jumped
at the opportunity to knock out a script to iterate over all the
source code files in the application and get a simple line count after
filtering blank and comment-only lines (there was no standard LOC
counter available at the time). The next day, I asked him how many
lines of code his script had counted. He said “40 thousand!”. I said
“Wow, that's a lot bigger than I would have guessed.” I also asked him
if he had excluded sin_cos_table.c from
the count. The answer was, no, and I went on to explain how that file
just consisted of hardcoded sine and cosine lookup tables that someone
had generated with a simple script (yes, people used to do this kind
of thing to save CPU cycles) and the file was 20 thousand lines long.
I wondered if the person who needed to know this LOC number would care
whether half of the reported LOC count was really just easily
generated data files. I realized that day just how terribly inaccurate
software engineering metrics can be without proper care.
In software development, there are countless examples where a metric is taken in a haphazard or even inaccurate way and then presented by or for management along with some kind of narrative. For years now, I have been calling this management pseudo-science. It's about sophisticated-looking PowerPoint charts with graphs and impressive-sounding jargon all with a tenuous connection to reality at best. It has the illusion of real engineering without the substance.
The problem isn't metrics in general—it's bad metrics that look like good ones. The goal is to elevate our use of metrics to the level of substantive engineering to support management in the difficult job that they do every day. In this article, I will share some personal experiences where we have unknowingly practiced such management pseudo-science and offer my suggestions for how we can avoid such tendencies.
Task Hours
Sometimes software developers are asked to record the amount of time that they spend on every task in their work management system. By task-level tracking, I mean relatively small units of work usually anywhere from under an hour to several days. These time records are distinct from those recorded in a time card system where time is tracked against a project or contract, possibly even linked to billing and/or payroll systems. Such task-level time tracking is often notoriously inaccurate. The problem is that task-level tracking assumes developers focus mainly on a curated set of well-defined tasks. The reality is that developers are often discovering additional tasks as the work progresses. Developers are faced with the decision of either lumping that time in with the previously documented tasks, creating new tasks, or simply letting that time fall into the abyss of untracked time. Results vary widely. Also, developer time is often far more fragmented than management would care to admit. Developers often have to context switch among multiple high-priority tasks, help teammates with their work, answer questions from others, attend meetings, and handle miscellaneous distractions (or create their own).
In contrast, higher-level time card tracking can be quite accurate provided there is training that conveys the reasons for the system, standard practices for time tracking, a culture of respect for accuracy, and punitive consequences for deliberate or negligent inaccuracy. Without such measures, even billable hours can be grossly inaccurate.
Some level of time tracking is often necessary for business. But if developers are simply told to track time on small tasks, then you really have no idea what quality of data you're getting.
Cloud Expenses
It seems that reporting cloud expenses would be a hard one to screw up. I mean, you get a bill from your cloud provider that is accurate to the penny and you have virtually unlimited ability to tag and group resources in any way you want for cost reporting. Nevertheless, I was once working with an organization that was running several different applications, some of which I knew very well. I had heard that someone took the monthly cloud bill total and divided it by the number of applications that we ran to determine the cost for each application individually. I winced in horror upon hearing this. One application had just two modestly spec'd production servers with a small database. Another had several high-spec servers and a database ten times larger. And a third application was set up for autoscaling in Kubernetes.
Cloud providers offer easy-to-use tools like resource tagging to track costs accurately per application, per environment, or however you want to break things down.
Lines of Code
I used to work at a large company, in the days before widespread agile development, that used a concept of “equivalent lines of code” (or “ELOC” for short) to estimate the number of development hours a project would take based on requirements. ELOC included the new lines of code to be developed and lines to be modified. In addition, it included a fraction, say 10%, of all lines of code in files that were being touched. Furthermore, ELOC also included a smaller fraction, say 2%, of all lines of code in the library or executable as a whole in files not being touched. So, if a small project was updating a single library and estimated to add/modify about 500 lines of code, the ELOC might be something like 900 ELOC. The more the changes were sprinkled throughout the component versus concentrated in a few small files, the higher the ELOC would be. Despite all the well-known issues with LOC-based estimation, this system actually worked fairly well in practice. It captured, at least crudely, that larger components take more effort to maintain, and that context-switching across many files and components increases development time.
Where the system really veered into management pseudo-science was in the post-project measurement of ELOC. Management wanted to see the actual post-project ELOC measured after the project was completed to compare to the original estimates--a perfectly reasonable thing to do. The trouble was that measuring ELOC given the tools of the time was fraught with peril.
Measuring ELOC meant that you had to identify every file in every library/executable that was touched by a project over the course of its development (which could be years). At this point, the code for the project had already been integrated onto a branch with other projects and numerous bug fixes. There was much ambiguity in trying to decide what project a given change was for and whether you should use the current or original file sizes or whether you could even figure out what the original file sizes were when the project started. Ultimately, producing the actual ELOC number was a game of guesswork and voodoo.
No estimate of the error in such numbers was ever produced. Typically a senior developer was asked to produce the numbers by the end of the week and not to spend too much time doing so. The data was put onto highly polished PowerPoint charts. Various tales were spun to explain the data such as “our ELOC estimates were good but our hours were high because we had several new developers on the team” or “our ELOC estimates were off because none of the original developers of XYZ library are with the company anymore”, and so on. No one ever stopped to ask just how much noise was being injected into these postmortems by inaccuracy of measured ELOC.
The idea of collecting real metrics and comparing them to estimates is fantastic. But a metric has to be something that can be defined and consistently measurable with the tools available, otherwise it's just pseudo-science.
Conclusions
My suggestions to avoid practicing pseudo-science are these: Decide what metrics are important to your organization. Define them so there is a common understanding of what they mean. Devise a standard and consistent way to measure the metrics. Understand the degree to which metrics are providing a limited or comprehensive view of reality. And present metrics with an indication to the audience about the degree of error or uncertainty around them.
We are human beings. It is our nature to tell stories. But we are also engineers. It is our nature to want to understand reality as it stands and to be transparent about our uncertainties, inaccuracies, and flaws.
Acknowledgements
ChatGPT was used in the preparation of this article for light proofreading, stylistic suggestions, and formatting.