Total Internal Reflection

An Ode to the Generalist

2023-03-13T00:00:00+05:30

This post is probably a spiritual successor to Resilient Knowledge Bases.

I fear for the death of the Generalist.

The Generalist is characterised not by his extreme capability to specialise in one particular area, though that is not an uncommon trait. He is characterised by his ability to shapeshift into whatever form that derives the maximum value around a particular set of circumstances. Business Analyst not around? He can write good stories, and hold the fort well enough until the BA returns. QA not around? He can devise a reasonable QA strategy. A project needs some UX practices in place? The Generalist will understand enough from first principles, as well as digest literature on the latest library, and produce something halfway decent. None of these cases imply that the Generalist stands alone. He knows fully well his limitations.

The Generalist is a Born Troubleshooter. This comes from his experience in having to figure his way out of numerous unfamiliar problems as they struck. He will be the first person you turn to on your project when the build is broken at 5 pm on a Friday evening, and you cannot figure out why it’s breaking. To be fair, the Generalist will not have all the information that is needed to redeem the situation. This leads us to the next trait of the Generalist…

The Generalist knows the Right Questions to ask. He is not infaliible, and he is not omniscient. He knows his limitations, especially when confronted with an unfamiliar domain. But he has built a mental framework which he can use to zoom in and out of the situation to map out an unfamiliar terrain or problem space. You may or may not have heard of a Role Playing Game system called Microscope. In that, players collaborate in building the history of a (usually fictional) world or universe. The flexibility of the system comes from the fact that at any point, a player can zoom out to describe events of a historical scale, like the rise and fall of a civilisation; equally he can zoom in to describe events of a single (fictional) person’s day and how that ultimately affected the outcome of a large-scale event. That is how the Generalist operates; sometimes he will ask 10,000 feet-level questions; sometimes he will ask why the contents of a particular register changed from 0x20 to 0xFF. These are not random questions he asks; he is simply figuring out the lay of the land, particularly the interesting spots.

The Generalist always has a Plan; and he lays it out freely. He knows that one of his primary objectives is to help others – who have more context of the problem, and have more knowledge – reach conclusions or resolutions. More often than not, he will sketch out his thinking to others, encouraging them to fill in the gaps, and point out the loopholes. He will want you to realise that “Oh, my event history ordering is incorrect because the sometimes the events are reaching the database out of order, and OMG I can’t depend on server clocks to establish causality”. That insight came from you, not him; his MO is to state the evident facts and build a chain of reasoning to help you – the expert – reach the conclusion that was already present in your head, but not accessed. The Generalist is thus a Team Player.

The Generalist has his Fundamentals firmly in place. He is not buffetted by the whims and fancies of the latest frameworks and libraries. That is not to say that he does not learn these, or is immune to charms of particularly compelling programming language. But he does not despair simply because he has not used the latest technology on his project. He makes bets on things that will force him to expand his dictionary of fundamentals, and learns those. He may not remember the latest API’s, but he understands the spirit behind the learning, and is fully capable of jumping onto the saddle of hands-on implementation if called for. But, the Generalist is always Learning. If he has yet to understand Functional Programming fully, he will attempt to incorporate that thinking into his toolbox. If he feels that learning Vim is worth it, because it is usually the lowest common denominator on all Unix-like operating systems, he will do it. He will select a technology, pluck the hardy core idea behind this technology, and file it away for the future.

The Generalist is a fierce Specialist in his preferred area of specialisation. He may be known for his expertise in this area, but he does not let this define him. He hones his knowledge and capability in this area with a single-minded fervour, because he knows that if something is worth doing, it is worth doing it extremely well. He does this not to show off, but because he believes in attaining some semblance of mastery in his chosen discipline.

I fear for the death of the Generalist.

Economic Factors in Software Architectural Decisions

2023-02-20T00:00:00+05:30

This article continues from where Every Software Engineer is an Accountant left off. I have had feedback that I need to make my posts on these a little more explainable; I will attempt to do that here.

The posts in this series of Software Engineering Economics are, in order:

Introduction

In previous articles, we have spoken of examples of doing NPV analysis for architectural and technical decisions, to determine viability and bubble up the tangible value of these intangible decisions to senior stakeholders. However, apart from examples, we have mostly glossed over what sort of economic factors should be considered when assigning value to these decisions. As it turns out, this is not hard: these economic factors are very closely tied to the factors we use to judge the technical benefits and costs of these decisions. We mostly need to tie them to actual financial value, in terms of hours, and ultimately, money. Thus, we list tables containing economic factors to consider for common architectural decisions.

In parallel, we need to measure these costs and benefits relative to some baseline. Thus, we propose certain baselines to judge common architectural decisions against.

There is also an important point we implicitly assumed: that implementing these decisions in code will automatically give us this value. However, there needs to be some arbiter of whether this value was actually delivered or not. Architectural decisions require effort, and the decision of whether that concrete effort achieved everything we set out to do, must be supported by something similarly concrete. We argue that feature-level tests as arbiters of value; given the fact that almost all teams use feature tests to verify that the software is fit for purpose, this seems to be a natural place to assign economic value to. We use the term “feature tests” rather loosely; these could be testing functionality, as well as verifying performance of these features. Any test that can demonstrate an aspect of the solution to which the business has assigned explicit value to, falls into this category of “feature test”.

Baselines

Decisions need to be taken at multiple levels of abstraction of a codebase. Some examples are:

Should I rename this variable or not?
Should I refactor this piece of code into its own function or not?
Should I apply this design pattern (e.g., factory) or not?
Should I implement tracing or not? (You really should :-)
Should I use a plugin architecture or not?
Should I break this out into its own microservice or not?
Should I use Kafka or Google PubSub?
…and so on

To enumerate the costs and benefits of these decisions, we need to calculate them relative to some baseline implementation. This baseline implementation may exist already or not, but it serves as a useful yardstick to drive out all the benefits that would occur if the decision was taken, or all the future problems which would occur if the decision was not taken (which would ultimately translate to financial losses), or the costs involved in implementing this decision.

As programmers, we make several lower level decisions over the course of a programming session with an intuitive understanding of the benefits of taking a particular decision (renaming a variable to be more descriptive, ultimately helps in readability for others – current and future – working on the codebase). This is fine; we don’t really need to evaluate the economic value for every small decision where the cost to make the change is vanishingly small, thanks to modern refactoring tools.

The decisions start to matter at higher levels of abstraction: at the architecture level, at the service level, and so on. Changes at those macro levels occur relatively less often, and corresponding changes require greater effort; new deployments, additional dependency fixups, etc. Decisions at these levels thus benefit the most from explicit economic evaluations. These are the places where a baseline would help.

We thus propose the following baselines for some frequently occurring decisions:

Monoliths as baseline when considering microservices
Hardcoded plugins as baseline when considering microkernel architectures
Peer-to-peer invocations as baseline when considering event-driven architectures / event buses
Hardcoded components as baseline when considering pipe and filter patterns
RDBMS (something like PostgreSQL) as baseline when considering NoSQL databases

Each of the above decisions has one or more expansion factors: these are the factors that make taking the decision potentially worthwhile. For example, if there was no need for future plugins to extend or add new functionality, there would be no need for a microkernel architecture; the number of future extensions is thus a expansion factor for this decision. If the list of components in a processing pipeline did not change at all, there would be no need of a pipe and filter pattern; the future configurability of components is the expansion factor for this decision.

It is also important to note that the above decisions are not exclusive. A microservice may encapsulate a microkernel, parts of a pipe-and-filter architecture might involve invoking microservices, and so on.

Catalogue of Economic Factors

In this section, we present a set of tables summarising sets of economic factors to consider when making some common architectural decisions. The lists of factors are not complete: expect changes as we add more over time. Nevertheless, these should get you started on making your decisions.

Notation Alert: The ’+’ symbols represent potential economic benefits; the ’-‘ symbols represent potential economic downsides.

Dimension	Microservices with Monolith Baseline
Deployment	- What are the savings in development/deployment time when services are deployed independently? - What is the effort in building pipelines for separate deployments? - What is the cost of building reusable provisioning scripts?
Monitoring	- What are the costs of setting up dashboards, alerts, and monitors for one microservice? For N microservices?
Tracing	- What are the costs of setting up standard tracing integrations across microservices? - What are the costs of maintaining traceability across a heterogenous chain, part of which might be legacy? - What time losses could occur when tracing issues across services if tracing is not uniformly implemented?
Resources	- What is the cost of additional cloud compute and DB resources will be needed if each microservice needs to deploy and potentially scale independently? - Which services need to reserve capacity vs. which services have predictable load?
Downtime	- What is the cost of building circuit breaker/throttling infrastructure for multiple services? - What is the cost of building caching layers across services if services need to be available? + What are the benefits in terms of uptime when failures are localised to specific microservices?
Latency	- What is the loss in profits (if applicable) if a certain latency threshold is not met? - What is the cost of reducing latency to acceptable levels so (caching, duplication of data, etc.) that latency is below this threshold?
Scaling	+ What is the expected opportunity loss if the monolith cannot be scaled beyond a certain point? - What is the cost of having to scale X microservice along with corresponding components like databases, downstream microservices, etc.?
Option Premium	+ What is the cost of building a modular monolith to take advantage of migrating to microservices later?

Dimension	Microkernel with Hardcoded Components Baseline
Future Functionality	+ What is the cost savings of adding substitute/added functionality with standard plugin interfaces?
Error Handling / Failure Scenarios	+ What is the cost savings of not having to rewrite common/standard error handling scenarios?
Static/Dynamic Binding	+ What are the cost savings of being able to swap out plugin implementations at compile time/runtime?
Plugin Testing	+ What are the cost savings of being able to test plugins independently?

Dimension	Event-Driven with Peer-to-Peer Baseline
Future Consumers	+ What are the cost savings of being able to add additional consumers without rewiring direct invocation? + What are the cost savings of being able to test future consumers independently using synthetic events? - What is the cost of having to maintain and evolve backward-compatible event schemas?
Architecture	- What is the cost of having to build orchestrators or choreographing facilities? - What is the cost (if any) of having to deal with potential incoming out-of-order events? - What is the cost of using a product to facilitate these interactions? - What is the cost of having to build facilities to persist states in case multiple events need to be received to reconstruct a domain entity? - What is the cost of building caching to rebuild your store if this is an event-sourced system? - What is the cost of setting up periodic compaction of historical events, if this is an event-sourced system? - What is the cost of separating and maintaining read and write schemas, if this is a CQRS system?
Tracing	- What is the cost of having to reconstruct fault trees from event traces? - What is the cost of building infrastructure to propagate tracing information across separate processes (if applicable)?
Failure Scenarios	- What is the cost of setting up additional infrastructure to handle / retry in the case of failure scenarios? - What is the cost of performing event replays in the middle of a event chain? - What is the cost of building in explicit event flows for rollbacks in an event chain? - What is the additional cost of building detection of events lost in transit and possibly compensating for incomplete event chains?
Evolution	+ What are the cost advantages in terms of adding/removing consumers without modifying sourcing events? + What are the potential future cost savings gained by allowing replacement of the system by strangulation?
Performance	- What is the potential opportunity loss of higher latencies of certain performance-sensitive operations exceeding acceptable SLAs? - What is the cost of any architectural changes to optimise reads and writes (e.g., CQRS)?

Dimension	Pipe and Filter with Hardcoded Components Baseline
Future Reconfiguration	+ What are the cost savings of being able to add/modify/remove components to the pipeline without having to modify the underlying infrastructure?
Monitoring	- What is the cost of having to set up monitoring for each individual data processing step? - What is the cost of having to aggregate this at an enterprise level (like federated Prometheus, for example)?
Tracing	- What is the cost of having to set up extra tracing to trace data flow in error/diagnosis scenarios?
Stream Processing complexity	- What is the cost of configuring the system to handle complex dependencies between streaming data events (things like streaming joins, out of order events, etc.)?
Failure Scenarios	- What is the cost of setting up additional infrastructure to handle / retry in the case of failure scenarios? - What is the cost of performing event replays in the middle of a event chain? - What is the cost of building in explicit event flows for rollbacks in an event chain?

Dimension	NoSQL with RDBMS Baseline
Constraints and References	- What is the cost of having to define software-level constraints and reference integrity checks? + What are the cost savings in speedups achieved because of lack of constraints?
Data Schema	- What are the costs in maintaining backward-compatible schemas? + What are the cost savings of not having to do schema migrations with data model changes?
PACELC guarantees	- Are there any potential cost implications of inconsistent or slow-to-retrieve data (like time-sensitive data in financial markets) even when the system is not partitioned? If so, what is this cost? + If there is partitioning, what are the cost benefits of having the system available (if the system is AP)?
Scaling	+ What are the cost savings of not having to scale vertically, or introduce other techniques like partitioning to keep the database performant?
Redundancy and Replication	+ What are the cost savings of building replicas and failovers for disaster recovery over their RDBMS counterparts? + What are the cost benefits of being able to tap into the database’s event stream for change data capture?

Tests as Markers of Economic Value

We have spoken about how value can be measured, uaing the income approach, the market approach, etc. However, the question still remains: how do we connect the decisions we make (at the code level, at the architecture level, etc.) to the actual economic value.

At the business level, the closest connection to economic value is the feature of an application. Features are more or less atomic units of user-facing functionality (the user can be a human or another system) which can be (hopefully) deployed, enabled/disabled, and monetised independently.

Using features as units of economic value therefore seems plausible. The next question then arises: how do we verify that these features satisfy all the criteria to deliver this value? We propose a simple and natural answer: tests. Developers already use tests to validate every part of the system, at multiple levels of abstration, ranging from unit tests to integration tests to regression tests.

We propose that economic value be attached to the tests which verify that features function properly. Different aspects of the feature can be validated by different sorts of tests.

graph TD subgraph features[Features] feature1[Feature] feature2[Feature] feature3[Feature] end subgraph patterns[Patterns] pattern1[Pattern 1] pattern2[Pattern 2] pattern3[Pattern 3] end subgraph architecture[Architecture] adr1[Architecture Decision 1] adr2[Architecture Decision 2] adr3[Architecture Decision 3] end code[Code]-->patterns code-->architecture patterns-->features architecture-->features feature1-->test1[Tests] feature2-->test2[Tests] feature3-->test3[Tests] test1-->economic_value[Economic Value] test2-->economic_value test3-->economic_value

Code may be refactored into patterns; more macro-level organisational units are generally represented as architectural elements. For this discussion, patterns are treated as lower level abstractions than architectures, even though they appear at the same level in the fiagram above. Thus, patterns are largely independent of the architectures they are applied in. For example, whether you are using a microservice architecture or not does not constrain you from either using or not using a factory pattern in any of those microservices.

As an example of how value flows through this chart, consider an e-commerce payment integration system: it could have requirements which deliver value. We’d like to derive these concrete, qualitative values from these features. A sampling of these features is listed below:

It should be able to process Visa and Mastercard credit cards.
It should be able to process at least 100 transactions per second.
It should be able to cancel an amount which has already been authorised if indicated.

Each of the above requirements can be verified to a certain degree of rigour through tests. What would be the economic contribution of the above requirements?

For the requirement of processing Visa/Mastercard credit cards, the income streams arising from the expected number of users with these kinds of credit cards (based on demographic analysis) making purchases of amounts (determined from historical data) over some period could be a straightforward derivation of the financial value of this feature. If we expect a median of 100,000 users/month with Visa/Mastercard credit cards to buy things at the site for a median amount of $50, the projected value of this feature over 3 months would be: $$5000000 + \displaystyle\frac{$5000000}{(1+1.1)} + \frac{$5000000}{ {(1+1.1)}^2 } \approx $13677685$ (given the hypothetical discount rate is 10% per month).
For the requirement of processing at least 1000 transactions per second, if the processing capability is already at or above the 1000 TPS number, the value is already counted as part of the transaction processing feature (i.e., no extra work needs to be done). If the capability is less than 1000 TPS, say 800 TPS, then the value of the feature is the opportunity loss because of not processing those extra 1000-800=200 transactions per second. The income streams arising from those 200 transactions per second performing financial transactions of some median amount over a sustained period of time could be a straightforward way to quantify the financial value of this feature.
For the requirement to cancel an already-authorised amount, the cost of having support staff available to respond to customer calls for cancellation, and perform this action manually, could be one way to quantify the value of this feature. If 10 support staff personnel are paid about $4000/month, and deploying this feature could halve the support staff needs, then the value of this feature over 2 months would be $5\times $4000 + \displaystyle\frac{5 \times $4000}{1+1.1} \approx $38182$ (obviously, we are simplifying this for the purposes of illustration).

References

Papers and Reports
- Consistency Tradeoffs in Modern Distributed Database System Design
- Making Sense of Stream Processing

Advice I’d give a younger me

2023-02-13T00:00:00+05:30

This is a weird mix of advice I’d give the less-experienced me, as well as reflections of my personal value system. This verbal diarrhoea came out all at once in a single sitting of 45 minutes. I apologise for some of the strong language in here, but I thought I’d share it without much censoring.

This post sums up a lot of my core beliefs and reflects many of my biases, so it’s not necessarily “good” advice; it’s just things I would share with someone I was mentoring. All of these are personal opinions.

Happy Reading!

On Learning and Abstractions

Learning underpins engineering. This can either be from an engineering aspect or from a strategy aspect. Everything is learnable, it mostly depends upon what you are primarily drawn to. Whether it is building executive messaging skills, building connections, or nudging people towards their stated goals, all of these are within your cognitive abilities.
Learn upwards. These include things that we consider “bullshit skills”. However, these are the skills which make people feel warm and fuzzy and more inclined to cooperate with you / not be hostile towards you. in order to get your job done. These are the skills that will help you in discussions with clients, internal management. If it helps you to think more like an engineer, this is the tech stack for communicating with humans, who are sometimes illogical, ego-driven, and frequently have goals different from yours.
Learn downwards. Always keep learning new technical things. Things which drive you. If you have found yourself in a position which prevents you from doing programming that you like, find time to do that outside. Or pick other things which will still exercise your brain (I study pure math to keep my brain from rotting). But keep learning. This is a non-negotiable. Do not call yourself an engineer otherwise, just a person who is collecting a paycheck (which is also fine, btw. I don’t judge, but that’s contrary to my value system). See Resilient Knowledge Bases : Fundamentals, not Buzzwords for some thoughts on the nature of things that I think you should focus on.
But, most importantly, learn upwards. The upwards skills will seem useless, difficult to put into practice, and do not guarantee results in every situation. See the next couple of points on why this is still worth investing in.
Engineers must learn to abstract. Keep learning consulting tools. Bigshots want the 10,000 feet view and hate processing details. Think of them as just programmers who are working at a higher level of abstraction. Imagine if you had to understand what the circuits of the computer were doing every time you wrote a single line of code in your favourite programming language. Invest in moving to a higher level of abstraction when expressing your thoughts. This ties into the skills needed to get a seat at higher levels of decision-making.

On Decision-Making and Influencing

Engineers should – and must – have control and a say in strategic decisions. They must have a voice at every level of management. Products and services do not exist without the work of engineers. Engineers are not order takers – YOU are not an order taker. You may take advice, but not orders. Accordingly, if on your account, if there are decisions being taken which affect you and/or your team, and you do not have an engineering voice at the table, question, Question, QUESTION. Get yourself (or someone you trust on to the table). If you do not do this, someone will make your decisions for you. I don’t think you’d like that. You may think that all you really need is to build great software, but this is the consulting industry. Simply building good software is not enough. There is client perception management, there’s internal perception management, there’s a lot of bullshit going on, and some people thrive on it. Some people get used to it. Some people navigate through them. You are no exception. Understand that you can choose to ignore these factors, but these factors will not ignore you.
In accordance with the above point, I believe that there is one fundamental duty that engineers have: and that is to clarify this reality to less experienced engineers who are more focussed on developing their hard skills. Not educating our peers about this can lead to a vicious cycle of engineers relegated to executors of other people’s visions. You don’t want that. Fuck that noise. Strategic decisions are too important to be left to bigshots and nontechnical people.
However, before you go running off into the wild, hysterically demanding a seat at every table, it is important to understand the appropriate language at the appropriate level of abstraction. I touched on the topic of abstraction above, but let us expand on this a little further. The language I am talking about is the language of economics. Indeed, most important decisions taken at the organisation level are always against the backdrop of profit/loss, discounted cash flows, albeit with a technological bent. In fact, there continues to be a wide disconnect between the mindset which most software developers make decisions, and how executives make their decisions. This usually shows up as distrust and disbelief around estimates, friction and frustration around value articulation, and so on. My increasingly strengthening belief is that all software engineers should think relentlessly like economists. There are several practices in agile development which encourage this mindset, but my contention is that the spirit of the concept is lost behind mindless practice, and that there is much more that developers and architects could – and should – do to bridge this gap.
Engineers must learn to think in terms of options. Too many times, engineering viewpoints are overridden by financial considerations, revenue considerations, delivery considerations, because, very rightly, the decision-makers are thinking inside a different framework. Actually, most of the time, everything translates to money. Thus, it behooves you to always think in terms of a spectrum of options, starting with “cheapest-shittiest” to “expensive-elegant”. Understand that you are the custodian of pulling the slider in the direction of “expensive-elegant”, while the bean-counters are looking for “cheapest-shittiest” but won’t always say that out loud, usually couching it in more diplomatic terms like “we’ll pay off the tech debt later”, “there needs to be a more creative solution” (Have I mentioned how much I hate it when people mention “creative”?). There are several ways this can be achieved. I outline the beginnings of one which leverages the economics-based frameworks to guide and project engineering decisions. See here and here.

On Building Engineering Culture

Culture, particularly engineering culture, cannot flourish in a vacuum. It must be aggressively nurtured. Nurturing is not about deciding to do things by committee. It needs to be fostered by example. If you do not see the culture you want to be in, you have 4 options: 1) wither away full of regret in an unfulfilling environment, 2) take steps to lead by example, 3) instill culture by fiat (never really works, so I don’t even know why I put this as an option), 4) leave. (4) is the nuclear option. (1) is the do-nothing option. (2) is sort of a gamble, because its success doesn’t simply depend upon your skill. It also depends upon your personality, how you showcase your examples, how you include people. Reputation naturally grows from the quality of your work, but management is continuously eyeing you as an untapped opportunity to train the next generation of technologists.
Decide how much you want to invest in facilitating building culture without sacrificing your own personal goals. Remember that your personal goals are always the most important; stay true to them, but remember that they are not the only ones. In most cases, as an engineer (unless you are actively planning to go completely hands-off), your personal goals will not align with what the organisation really expects you to do (to justify your billing rates). It is a constant trade-off between what you’d really rather be doing vs. what you feel you must do to advance in your organisation (whether your motivation is money, the corporate ladder, or whatever). Make that conscious decision, but MAKE it. Don’t stagnate.

On Performance Reviews

On that same note, performance reviews are bullshit and mostly performative. Half of it is based on the fantasies people have of what the perfect person in your role will be doing, and the other half is ticking checkboxes to satisfy some inane requirement that we are helping everyone grow. To be honest, you cannot really blame management, because performance reviews are literally the only shortcut that they can think of whenever they want to know if they are wringing the maximum value out of people, in a large scaling organisation. (Btw, Dunbar’s Number is disputed heavily in later studies; another example of how corporate thinking clings on to buzzwords without examining new evidence; see https://royalsocietypublishing.org/doi/10.1098/rsbl.2021.0158). Instead, find yourself a set of people whose work you really admire; this work can be upward-focussed and/or downward-focussed. Compare yourself against these people. Ask them. Learn by imitation. If you get to a point where you start doing some of the things you weren’t doing before because of these people, you have improved. Full stop.
On the flip side, organisation performance reviews are good for one thing: they act as reminders in case you are really drifting somewhat aimlessly. Repurpose this inane bureaucratic exercise to your own liking otherwise, as you see fit. Put the goals that you are really interested in and work through those at your own (hopefully motivated) pace. There will always be organisational goals that you will be expected to work towards too. Take those as opportunities to learn those upwards-facing skills, and try things which are outside your comfort zone. Remember, try anything once. You don’t have to like it, and you can stop doing it after a couple of review cycles. You always have a choice to say “No” in doing something; the consequences are never as dire as you fear.

For Introverts

This section is specifically for introverts, since I am one myself.

Do not let anyone pressure you into making an on-the-spot engineering decision. Firmly say “I need to think about it”.
Be shameless in interrupting. Extroverted, consulting types hog a lot of time and think nothing of it. You are not being rude in interrupting in order to get your point across. People do it all the time, and are hailed as being assertive. So go be “assertive”.
If you are thinking, and someone interrupts, feel free to excuse yourself and walk away and continue your thought. People do not respect the state of flow, knowingly or unknowingly (even engineers aren’t telepaths), and they need to be firmly reminded that this is not acceptable.

Every Software Engineer is an Accountant

2023-02-04T00:00:00+05:30

This article continues from where Every Software Engineer is an Economist left off, and delves slightly deeper into some of the topics already introduced there, as well as several new ones. In the spirit of continuing the theme of “Every Software Engineer is an X”, we’ve chosen accounting as the next profession.

The posts in this series of Software Engineering Economics are, in order:

Every Software Engineer is an Economist(this one)
Every Software Engineer is an Accountant (this one)
Economic Factors in Software Architectural Decisions

In this article, we cover the following:

Waterfall Accounting: Capitalisable vs. Non-Capitalisable Costs
Articulating Value: The Value of a Software System
Articulating Value: The Cost of Reducing Uncertainty
Articulating Value: The Cost of Expert but Imperfect Knowledge
Articulating Value: The Cost of Unreleased Software
Static NPV Analysis Example: Circuit Breaker and Microservice Template

Waterfall Accounting: Capitalisable vs. Non-Capitalisable Costs

Capitalisable is an accounting term that refers to costs that can be recorded on the balance sheet, as opposed to being expensed immediately. These costs are viewed more favorably as they are spread out over the useful life of the asset, reducing the impact on net income. The accounting standards outline specific criteria for determining which costs are capitalizable. One criterion is the extent to which they provide a long-term benefit to the organization.

Accounting plays a significant role in software development processes. There are specific guidelines which state rules about what costs can be capitalised, and what costs should be accounted as expenses incurred. Unfortunately, the accounting world lags behind the agile development model; GAAP guidelines have been established based on the waterfall model of software development.

Costs can be capitalised once “technological feasibility” has been achieved. Topic 985 says that:

“the technological feasibility of a computer software product is established when the entity has completed all planning, designing, coding, and testing activities that are necessary to establish that the product can be produced to meet its design specifications including functions, features, and technical performance requirements.”

Agile doesn’t work that way. Agile does not have “one-and-done” stages of development since it is iterative; there is not necessarily a clear point at which “technological feasibility” is achieved; therefore the criteria for “technological feasibility” may be an important point to agree upon between client and vendor.

The problem is this: the guidelines state that the costs that should not be capitalized include the work that needs to be done to understand the product’s desired features and feasibility; these costs should be expensed as incurred costs.

For example, using development of external software (software developed for purchase or lease by external customers) as an example, the following activities cannot be capitalised:

Upfront analysis
Knowledge acquisition
Initial project planning
Prototyping
Comparable design work

The above points apply even during iterations/sprints. If we wanted to be really pedantic, during development, the following activities cannot be capitalised either, but must be expensed:

Troubleshooting
Discovery

This may be an underlying reason why companies are leery of workshops and inceptions, because these probably end up as costs incurred instead of capitalised expenses. (Source)

Value Proposition: We should aim to optimise workshops and inceptions.

Capitalisable and Non-Capitalisable Costs for Cloud

For Cloud Costing, we have the following categories from an accounting perspective:

Capitalizable Costs
- External direct costs of materials
- Third-party service fees to develop the software
- Costs to obtain software from third-parties
- Coding and testing fees directly related to software product
Non-capitalisable Costs
- Costs for data conversion activities
- Costs for training activities
- Software maintenance costs

This link and Accounting for Cloud Development Costs are readable treatments of the subject. Also see this.

Articulating Value: The Value of a Software System

There is no consensus on how value of engineering practices should be articulated. Metrics like DORA metrics can quantify the speed at which features are released, but the ultimate consequences - savings in effort, eventual profits, for example – are seldom quantified. It is not that estimates of these numbers are not available; it is discussed when making a business case for the investment into a project, but those numbers are almost never encountered or leveraged by engineering terms to articulate how they are progressing towards their goal. The measure of progress across iterations is story points, which is useful, but that is just quantifying the run cost, instead of the actual final value that this investment will deliver.

How, then, do we then articulate this value?

Economics and current accounting practices can show one way forward.

One straightforward way to quantify software value is to turn to Financial Valuation techniques. Ultimately, the value of any asset is determined by the amount of money that the market wants to pay for it. Software is an intangible asset. Let’s take a simple example: suppose the company which owns/builds a piece of software is being acquired. This software could be for its internal use, e.g., accounting, order management, etc., or it could be a product that is sold or licensed to the company’s clients. This software needs to be valued as part of the acquisition valuation.

The question then becomes: how is the valuation of this software done?

There are several ways in which valuation firms estimate the value of software.

1. Cost Approach

This approach is usually used for valuing internal-use software. The cost approach, based on the principle of replacement, determines the value of software by considering the expected cost of replacing it with a similar one. There are two types of costs involved: reproduction costs and replacement costs. Reproduction Costs evaluate the cost of creating an exact copy of the software. Replacement Costs measure the cost of recreating the software’s functionality.

Trended Historical Cost Method: The trended historical cost method calculates the actual historical development costs, such as programmer personnel costs and associated expenses, such as payroll taxes, overhead, and profit. These costs are then adjusted for inflation to reflect the current valuation date. However, implementing this method can be challenging, as historical records of development costs may be missing or mixed with those of operations and maintenance.
Software engineering model method: This method uses specific metrics from the software system, like size/complexity, and feeds this information to some empirical software development models like COCOMO (Constructive Cost Model and its sequels) and SLIM (Software LIfecycle Management) to get estimated costs. The formulae in these models are derived from analyses of historical databases of actual software projects.

See Application of the Cost Approach to Value Internally Developed Computer Software: Williamette Management Associates for some comprehensive examples of this approach.

Obviously, this approach largely ignores the actual value that the software has brought to the organisation, whether it is in the form of reduced Operational Expenses, or otherwise.

2. Market Approach

The market approach values software by comparing it to similar packages and taking into account any variations. One issue with this method is the lack of comparable transactions, especially when dealing with internal-use software designed to specific standards. More data is available for transactions related to software development companies’ shares compared to software. This method could be potentially applicable to internal-use systems which are being developed even though there are commercial off the shelf solutions available; this could be because the COTS solutions are not exact fits to the problem at hand, or lack some specific features that the company could really do with.

3. Income Approach

The Income Approach values software based on its future earnings, or cost savings. The discounted cash flow method calculates the worth of software as the present value of its future net cash flows, taking into account expected revenues and expenses. The cash flows are estimated for the remaining life of the software, and a discount rate that considers general economic, product, and industry risks is calculated. If the software had to be licensed from a third party, its value is determined based on published license prices for similar software found in intellectual property databases and other sources.

The Income approach is usually the one used most often by corporate valuation companies when valuing intangible assets like software during acquisition. However, this software is usually assumed to be complete, and serving its purpose, and not necessarily software which is still in development (or not providing cash flows right now).

Discounted cash flow method: This is the usual method where an NPV analysis is done on projected future cash flows arising from the product.
Relief from Royalty Method: This method is used to determine the value of intangible assets by taking into account the hypothetical royalty payments that would be avoided by owning the asset instead of licensing it. The idea behind the RRM is straightforward: owning an intangible asset eliminates the need to pay for the right to use that asset. The RRM is commonly applied in the valuation of domain names, trademarks, licensed computer software, and ongoing research and development projects that can be associated with a particular revenue stream, and where market data on royalty and license fees from previous transactions is available. One possible example is if a company is building its own private cloud as an alternative to AWS; the value that the project provides could be calculated from the fees that are projected to be saved if the company did not use AWS for hosting its services.

Real Options Valuation

This is used when the asset (software) is not currently producing cash flows, but has the potential to generate cash flows in the future, incorporating the idea of the uncertain nature of these cash flows. The paper Modeling Choices in the Valuation of Real Options: Reflections on Existing Models and Some New Ideas discusses classic and recent advances in the valuation of real options. Specifically surveyed are:

Black-Scholes Option Pricing formula: The original, rigid assumptions on underlying model, not originally intended for pricing real options
Binomial Option Pricing Model: Discrete time approximation model of Black-Scholes; not originally intended for pricing real options
Datar-Matthews Method: Simulation-based model with cash flows as expert inputs; no rigid assumptions around cash flow models
Fuzzy Pay-off Method: Payoff treated as a fuzzy number with cash flows as expert input; no rigid assumptions

I admit that I’m partial to the Binomial Option Pricing Model, because the binomial lattice graphic is very explainable; we’ll cover the Binomial Option Pricing Model and the Datar-Matthews Method in a sequel.

What approach do we pick?

There is no one approach that can account for all types of software. At the same time, multiple approaches may be applicable to a single type of software, with varying degrees of importance. It is important to note that the following categories are not mutually exclusive. The map below shows the type of value analysis that could be done for each kind of investment or asset.

graph TD platform[Platform] --> rov[Real Options Valuation] external[Products with External Transactional Value] --> dcf[Discounted Cash Flow] external --> market[Market] internal[Internal-Use Products] --> opex_npv[OpEx NPV Analysis] internal --> rrm[Relief from Royalty Method] enterprise_modernisation[Enterprise Modernisation] --> rov enterprise_modernisation --> opex_npv maintenance[Maintenance] --> opex_npv

1. Platform
Use: Real Option Valuation
A platform by itself does not provide value; it is the opportunities that it creates to rapidly build and offer new products to the market that is its chief attraction. A platform also allows creating other types of options as well, like allowing the company to build customised products of the same type to enter into new markets. For example, a custom e-commerce platform not only creates options to higher volumes of sales transactions in the current country, but provides the options to deploy a custom e-commerce site in a new country.

2. Products providing External Transactional Value
Use: Income, Market
These cover software which enable e-commerce, or allow access to assets in exchange for money. In many situations, projected incoming cash flows are easier to predict because of historical data, and provide a direct link to the value of the software. It is to be noted that components of these product may be built on top of a platform themselves, so the platform itself might be valued using real option pricing, while these are the actual investments themselves.
Also, products like COTS e-commerce platforms are widely available, and thus can provide good benchmarks in terms of the value being provided by the custom implementation.

3. Internal-Use products
Use: OpEx NPV Analysis, Relief from Royalty, Market
These cover systems which are used to streamline operational processes inside the company, and thus reduce waste. It is to be noted that there might be second, third and n-th order effects of these systems, thus reasonable efforts should be made to articulate those effects to provide a lower bound on the value of these systems. In most cases (but maybe not all), these reductions apply to the operational expenses of the company, hence the NPV analysis of OpEx is suggested. Such systems may also have COTS alternatives, through subscription or outright purchase. In those cases, Relief from Royalty and Market methods are also valuable ways of benchmarking value.

4. Enterprise Modernisation initiatives
Use: Real Option Valuation, OpEx NPV Analysis
Enterprise Modernisation can have multiple objectives. It can target any combination of the following:

Mitigation of risk (no one knows how the old code works, and long-time maintainers are retiring)
Expansion of service capacity (to meet higher traffic)
Decrease time to market for future features (it might take very long to add features to the current system)

Enterprise Modernisation can certainly benefit from an NPV analysis of Operational Expenses, but the main reason for undertaking modernisation is usually creating options for a more diverse product portfolio, or faster time to market for new features to continue retaining customers.

5. Maintenance
Use: OpEx NPV Analysis
Maintenance of production software which isn’t expected to evolve (much) is usually a matter of minimising production issues, and streamlining the operational pipeline which is already (hopefully) reliably delivering value. The primary metric for value in this case should be the expected reduction in operational expenses. If new features are added occasionally, positive cash flows may also figure in this NPV analysis.

Another approach to valuing software using different dimensions is discussed in the paper The Business Value of IT; A Conceptual Model for Selecting Valuation Methods. However, these are not methods that are strictly used by valuation firms. We’ve reproduced the selection model below.

This paper also mentions using Information Economics to articulate value. Information Economics uses multiple criteria, both tangible and non-tangible, to come to a unified scorecard of value. Unfortunately, this does not have a monetary value attached to it for the intangible value creation processes. We may talk about it in the future. Information Economics: Managing IT Investment expounds upon this approach.

Articulating Value: The Cost of Reducing Uncertainty

We will use this spreadsheet again for our calculations. We spoke of the risk curve, which is the expected loss if the actual effort exceeds 310. Let us assume that the customer is adamant that we put in extra effort in narrowing our estimates so that we know whether we are over or below 310.

The question we’d like to answer is: how much are we willing to pay to reduce the uncertainty of this loss to zero? In other words, what is the maximum effort we are willing to spend to reduce the uncertainty of this estimate?

For this, we create a Loss Function, and this loss is simply calculated as $L_i=P_i.E_i$ for every estimate $i \geq 310$. Not too unsurprisingly, this is not the only choice for a loss function.

The answer is the area under the loss curve. This would usually done by integration, and is easily achieved if you are using a normal distribution, but is usually done through numerical integration for other arbitrary distributions. In this case, we can very roughly numerically integrate as shown in the diagram below, to get the maximum effort we are willing to invest.

In our example, this comes out to 1.89. We can say that we are willing to make a maximum investment of 1.89 points of effort for the reduction in uncertainty to make economic sense. This value is termed the Expected Value of Information and is broadly defined as the amount someone is willing to pay for information that will reduce uncertainty about an estimate, or the information about a forecase. This technique is usually used to calculate the maximum amount of money you’d be willing to pay for a forecast about a business metric that affects your profits, but the same principle applies to estimates as well.

Usually, the actual effort to reduce the uncertainty takes far longer, and hopefully an example like this can convince you that refining estimates is not necessarily a productive exercise.

Articulating Value: The Cost of Expert but Imperfect Knowledge

Suppose you, the tech lead or architect, wants to make a decision around some architecture or tech stack. You’ve heard about it, and you think it would be a good fit for your current project scenario. But you are not completely sure, so in the worst case, there would be no benefit and just the cost sunk into the investment of implementing this decision. The two questions you’d like to ask are:

What is the maximum I’m willing to pay to reduce the uncertainty of this decision completely? This question is exactly the same as the one in the previous section, so is not in itself that novel, but it is a stepping stone to the next question.
What is the maximum I’m willing to pay to bring in an expert who can help me reduce this uncertainty to a lower value, but probably not to zero? In this case, the expert will not be able to provide you perfect information, and we must incorporate our confidence in the expert into our economics calculations.

We can use Decision Theory to quantify these costs. The technique we’ll be using involves Probabilistic Graphical Models, and all of this can be easily automated: this step-by-step example is for comprehension.

Suppose we have the situation above where a decision needs to be made. There is 30% possibility that the decision will result in a savings of $100000 going forward, and 70% possibility that there won’t be any benefit at all.

Let X be the event that there will be a savings of $20000. Then $P(X)=0.3$. We can represent all the possibilities using a Decision Tree, like below.

Now, if we did not have any information beyond these probabilities, we’d pick the decision which maximises the expected payoff. The payoff from this decision is called the Expected Monetary Value, and is defined as:

\[EMV=\text{max}_i \sum_i P_i.R_{ij}\]

This is simply the maximum expected value of all the expected values arising from all the choices $j\in J$. The monetary value for the “Implement” decision is $0.3 \times 15000 + 0.7 \times (-5000)=$1000$, whereas that of the “Do Not Implement” decision is zero. Thus, we pick the monetary value of the former, and our EMV is $1000.

Now assume we had a perfect expert who knew whether the decision is going to actually result in savings or not. If they told us the answer, we could effectively know whether to implement the decision or not with complete certainty.

The payoff then would be calculated using the following graph. The graph switches the chance nodes and the decision nodes, and for each chance node, picks the decision node which maximises the payoff.

We can then calculate expected payoff given perfect information (denoted as EV|PI) as:

\[EV|PI = \sum_i P_{j}.\text{max}_i R_{ij}\]

In our case, this comes out to: $0.3 \times 15000 + 0.7 \times 0=$4500$.
Thus the Expected Value of Perfect Information is defined as the additional amount we are willing to pay to get to EV|PI:

\[EVPI=EV|PI-EMV=4500-1000=$3500\]

Thus, we are willing to pay a maximum of $3500 to fully resolve the uncertainty of whether our decision will yield the expected savings or not.

But the example we have described is not a real-world example. In the real world, even if we pay an expert to help us resolve this, they are not infallible. They might increase the odds in our favour, but there is always a possibility that they are wrong. We assume that we get an expert to consult for us. They want to be paid $3400. Are they overpriced or not?

We’d like to know what is the maximum we are willing to pay an expert if they can give us imperfect information about our situation. To do this, we will need to quantify our confidence in the expert.

Assume that if there are savings to be made, the expert says “Good” 80% of the time. If there are no savings to be made, the expert says “Bad” 90% of the time. This quantifies our confidence in the expert, and can be written as a table like so:

Savings (S) / Expert (E)	Good	Bad
Savings	0.8	0.1
No Savings	0.2	0.9

In the above table, E is the random variable representing the opinion of the expert, and S is the random variable representing the realisation of savings. We can again represent all possibilities via a probability tree, like so:

graph LR A ==> savings["Savings
P(X)=0.3"] A ==> no_savings["No Savings
1-P(X)=0.7"] savings --> expert_good_1["Good
P(R)=0.8"] savings --> expert_bad_1["Bad
1-P(R)=0.2"] no_savings --> expert_good_2["Good
P(R)=0.1"] no_savings --> expert_bad_2["Bad
1-P(R)=0.9"] expert_good_1 --> p_1["P(Good,Savings)=0.3 x 0.8 = 0.24"] expert_bad_1 --> p_2["P(Bad,Savings)=0.3 x 0.2 = 0.06"] expert_good_2 --> p_3["P(Good,No Savings)=0.7 x 0.1 = 0.07"] expert_bad_2 --> p_4["P(Bad,No Savings)=0.7 x 0.9 = 0.63"] p_1-->p_good["P(Good)=0.24+0.07=0.31"] p_3-->p_good p_2-->p_bad["P(Bad)=0.06+0.63=0.69"] p_4-->p_bad

We now have our joint probabilities $P(S,E)$. What we really want to find is $P(S \vert E)$. By Bayes’ Rule, we can write:

\[P(S|E)=\frac{P(S,E)}{P(E)}\]

We can thus calculate the conditional probabilities of the payoff given the expert’s prediction with the following graph.

graph LR p_1["P(Good,Savings)=0.3 x 0.8 = 0.24"]-->p_good["P(Good)=0.24+0.07=0.31"] p_2["P(Bad,Savings)=0.3 x 0.2 = 0.06"]-->p_bad["P(Bad)=0.06+0.63=0.69"] p_3["P(Good,No Savings)=0.7 x 0.1 = 0.07"]-->p_good p_4["P(Bad,No Savings)=0.7 x 0.9 = 0.63"]-->p_bad p_1 --> p_savings_good["P(Savings | Good)=0.24/0.31=0.774"] p_good --> p_savings_good p_2 --> p_savings_bad["P(Savings | Bad)=0.06/0.69=0.087"] p_bad --> p_savings_bad p_3 --> p_no_savings_good["P(No Savings | Good)=0.07/0.31=0.226"] p_good --> p_no_savings_good p_4 --> p_no_savings_bad["P(No Savings | Bad)=0.63/0.69=0.913"] p_bad --> p_no_savings_bad

Now we go back and calculate EMV again in the light of these new probabilities. The difference in this new tree is that in addition to the probability branches of our original uncertainty, we also need to add the branches for the expert’s predictions, whose conditional probabilities we have just deduced.

graph LR A ==>|0.31| p_good[Good] A ==>|0.69| p_bad[Bad] p_good ==> p_implement_good[Implement] p_good --> p_dont_implement_good[Do Not Implement] p_bad --> p_implement_bad[Implement] p_bad ==> p_dont_implement_bad[Do Not Implement] p_implement_good ==>|-5000| implement_savings_given_good["Savings=20000
P(Savings|Good)=0.774"] p_implement_good ==>|-5000| implement_no_savings_given_good["Savings=0
P(No Savings|Good)=0.226"] p_dont_implement_good -->|0| dont_implement_savings_given_good["Savings=0
P(Savings|Good)=0.774"] p_dont_implement_good -->|0| dont_implement_no_savings_given_good["Savings=0
P(No Savings|Good)=0.226"] p_implement_bad -->|-5000| implement_savings_given_bad["Savings=20000
P(Savings|Bad)=0.087"] p_implement_bad -->|-5000| implement_no_savings_given_bad["Savings=0
P(No Savings|Bad)=0.913"] p_dont_implement_bad ==>|0| dont_implement_savings_given_bad["Savings=0
P(Savings|Bad)=0.087"] p_dont_implement_bad ==>|0| dont_implement_no_savings_given_bad["Savings=0
P(No Savings|Bad)=0.913"] implement_savings_given_good ==> implement_savings_given_good_payoff["0.774 x (20000-5000)=11610"] implement_no_savings_given_good ==> implement_no_savings_given_good_payoff["0.226 x (0-5000)=-1130"] dont_implement_savings_given_good --> dont_implement_savings_given_good_payoff["0.774 x 0=0"] dont_implement_no_savings_given_good --> dont_implement_no_savings_given_good_payoff["0.226 x 0=0"] implement_savings_given_bad --> implement_savings_given_bad_payoff["0.087 x (20000-5000)=1305"] implement_no_savings_given_bad --> implement_no_savings_given_bad_payoff["0.913 x (0-5000)=-4565"] dont_implement_savings_given_bad ==> dont_implement_savings_given_bad_payoff["0.087 x 0=0"] dont_implement_no_savings_given_bad ==> dont_implement_no_savings_given_bad_payoff["0.913 x 0=0"] implement_savings_given_good_payoff ==> plus(("+")) implement_no_savings_given_good_payoff ==> plus plus ==> max_payoff_given_good[10480] ==> max_payoff[10480 X 0.31=3249]

Thus, $3249 is the maximum amount we’d be willing to pay this expert given the level of our confidence in them. This number is the Expected Value of Imperfect Information. Remember that the EVPI was $3500, so EVII <= EVPI. If you remember, the expert’s fee was $3400. This means that we would be overpaying the expert by $3400-$3249=$151.

Articulating Value: The Cost of Unreleased Software

This spreadsheet contains all the calculations.

Static NPV Analysis Example: Circuit Breaker and Microservice Template

We show an example of articulating value for a simple (or not-sp-simple case), where multiple factors can be at play.

We are building a platform on Google Cloud Platform, consisting of a bunch of microservices. Many of these microservices are projected to call external APIs. Some of these APIs are prone to failure or extended downtimes; we need to be able to implement the circuit breaker pattern. We assume that one new microservice will be built per month for the next 6 months.

The development cost of these microservices is $2000.
The rate of return (hurdle rate) is 10%. This will be used to calculate the Net Present Value of future costs and benefits.
These microservices also require ground-up work when creating a new one. A microservice template or starter pack would reduce work required to deploy future microservices as well.

Unfortunately, Istio is currently not being used. Istio is an open source service mesh that layers transparently onto existing distributed applications. If Istio was being used, we could have leveraged its circuit breaker pattern pretty easily. We need to advocate for using Istio in our ecosystem. Let us assume that currently we have no circuit breaker patterns implemented at all. How can we build a business case around this?

There are a couple of considerations:

The deployment of the service mesh may be an expensive process.
The microservice template could also encapsulate a library-level circuit breaker implementation.
The microservice template would have other benefits that are not articulated in this example.

Articulate Tech Debt due to No Circuit Breaker
Articulate Library-level Circuit Breaker Option
Articulate Microservice Starter Pack-level Circuit Breaker Option
Articulate Service Mesh Circuit Breaker Option
Explore combinations of these options

All the calculations are shown in this spreadsheet.

1. Articulate Tech Debt due to No Circuit Breaker

Suppose we analyse the downtime suffered by our platform per month because of requests piling up because of slow, or unresponsive third party APIs. We assume that this number is around $10000. This cost and that of new microservice development, are shown below.

The current cash outflow projected over 10 months, discounted to today, comes out to -$87785. This is the first step towards convincing stakeholders that they are losing money. Of course, we can project further out into the future, but the uncertainty of calculations obviously grows the more you go out.

We’d like to propose a set of options

2. Articulate Immediate Library-level Circuit Breaker Option

This one shows cash flows arising out of immediately incorporating a circuit breaker library in each new microservice. The cost of incorporating this microservice includes any integration code as well as configuration. This effort remains more or less constant with each new microservice.

3. Articulate Immediate Starter Pack Option

This one shows cash flows arising out of immediately beginning to implement a Starter Pack which can be used as a template for building new microservices. Circuit breaker functionality is also included in this starter pack. Any integration code is also present in the pack by default. Note that the starter pack would normally also have other benefits, like preconfigured logging, error handling, etc.

4. Articulate Immediate Service Mesh Option

This one shows cash flows arising out of immediately beginning to implement a Service Mesh.

5. Articulate Immediate Library + Delayed Starter Pack Option

This option immediately starts integrating a circuit breaker library to reduce downtimes, but starts work on the starter pack a couple of months down the line. Once the starter pack is functional, explicit integration of the circuit breaker library will no longer be needed for each new microservice.

6. Articulate Immediate Library + Delayed Starter Pack Option + Delayed Service Mesh Option

This option is the same as above, except that later on, it also begins to implement a service mesh. Once the service mesh is complete, integration of the circuit breaker functionality in the starter pack will no longer be needed, but it will still continue to provide other benefits, like reducing initial setup time for a new microservice.

7. Review, Rank, and Choose

Here, we chart the (negative) discounted costs per month for all our options.

We may naively choose the one which has the least potential cost in the near horizon (which is the immediate starter pack option), or we can choose one of the service mesh options, assuming that service mesh is part of our architecture strategy, and that the cost differential is not too much. It is to be noted that for the service mesh to be part of our architecture strategy, other benefits of the service mesh need to be articulated using cash flows against the option of just having the starter pack do all those things.

Thus, it is not important to know which factors are being taken into account when doing the NPV analysis, and the final decision rests on all the relevant factors, not just an isolated one, like the one we presented in this example.

Conclusion

There are several other topics that we will defer to the next post. The following is a possible list of topics we’ll cover going forward.

The Value of Security
The Value of Pair Programming
Value Chain Analysis

References

Books
- Real Options Analysis
Papers
- Real Options
- Valuation
Web
- Information Economics
  - Good Presentation on using multicriteria (tangible and non-tangible parameters) methods of Information Economics to link to software value
- Decision Theory
  - Video on Expected Value of Perfect and Imperfect Information
- Software Valuation
- Software Accounting
  - Overview of Software Capitalisation Rules
  - Accounting for external-use software development costs in an agile environment
  - External Use Software guidelines - FASB Accounting Standards Codification (ASC) Topic 985, Software
  - Internal Use Software guidelines - FASB Accounting Standards Codification (ASC) Topic 350, Intangibles — Goodwill and Other
  - Accounting for internal-use software using Cloud Computing development costs
  - Accounting for Cloud Development Costs are covered under FASB Subtopic ASC 350-40 (Customer’s Accounting for Implementation Costs Incurred in a Cloud Computing Arrangement That Is a Service Contact (ASC 350-40)).
  - Financial Reporting Developments: Intangibles - goodwill and other. The actual formal document is here.

Every Software Engineer is an Economist

2023-01-22T00:00:00+05:30

Background: This post took me a while to write: much of this is motivated by problems that I’ve noticed teams facing day-to-day at work. To be clear, this post does not offer a solution; only some thoughts, and maybe a path forward in aligning developers’ and architects’ thinking more closely with the frameworks used by people controlling the purse-strings of software development projects.

The posts in this series of Software Engineering Economics are, in order:

[WIP] Here is a presentation version of this article.

The other caveat is that even though this article touches the topic of estimation, it is to talk about building uncertainty into estimates as a way to communicate risk and uncertainties with stakeholders, and not to refine estimates. I won’t be extolling the virtues or limitations of #NoEstimates, for example (sidebar: the smoothest teams I’ve worked with essentially dispensed with estimation, but they also had excellent stakeholders).

“All models are wrong, but some are useful.” - George Box

Every software engineer is an economist; an architect, even more so. There is a wealth of literature around articulating value of software development, and in fact, several agile development principles embody some of these, but I see two issues in my day-to-day interactions with software engineers and architects.

Folks are reluctant to quantify things they build, beyond the standard practices they have been brought up on (like basic estimation exercises, test coverage). Some of this can be attributed to their prior bad experiences of being micromanaged via largely meaningless metrics.
Folks struggle to articulate value beyond a certain point to stakeholders who demand a certain measure of rigour and/or quantifiability. Similarly, engineers fail to communicate risk to decision-makers. The problem is then that The DORA metrics are good starter indicators, but I contend that they are not enough. Let me be as clear as possible: CxOs don’t really care about precious developer metrics; they really care about the savings or profits which result from improving those metrics.
There is a reluctance to rely too much on metrics because people think metrics are easily gamed. This can be avoided if we use econometric methods, because 1) falsified data is immediately apparent 2) showing the work steps, assumptions and risks aids in this transparency because they are in the language of economics which is much more easily understandable to business stakeholders.
Thinking about value and deciding tradeoffs based on economic factors is not something that is done enough, if at all, at the level of engineering teams. For example, questions like “Should I do this refactoring?” and “Why should we repay this tech debt?”, or “How are we better at this versus our competitor?” are usually framed in terms of statements which stop before traversing the full utility tree of value.

Thinking in these terms, and projecting these decisions in these terms to managers, heads/directors of engineering – but most importantly, to execs – is key to engineers articulating value in a manner which is compelling, and eases friction between engineering and executive management. It is also a skill engineers should acquire and practise to break several firms’ perceptions that “engineers are here to do what we say”.

This is easier said than done, because of several factors:

The data to apply these frameworks is not always easily available, and may require additional investment.
Engineers can get invested in decisions that they think are their “pet” ideas.
It can be hard to inculcate this mindset en masse among engineers if they do not have a clear perception of the value of adopting this mindset. Engineers don’t want theory, they want tools they can apply quickly and easily. Hence, the burden is on us to propose advances to the state of the art in a way that is actionable.

Most of the thinking and tools discussed in this article have been borrowed from domain of financial engineering and economics. None of this material is new; a lot of research has been done in quantifying the value of software-related activities. The problem usually is translating those ideas into actions.

For these ideas to effectively work, they must permeate all the way across developers to tech leads to architects to managers. Thus, this article is divided into the following sections:

Communicating Uncertainty and Risk in Estimation Models
Articulating the Value of Timing (aka, Real Options)
Communicating Values and Risks of Tech Debt and Architectural Decisions
Deriving Value in Legacy Modernisation
Articulating the Value of Measurement (aka, the Cost of Information)

Simplifying Assumptions

The conversion of time to money is simply treated as the Cost to Company for a single individual working. This is a lower bound, since there will usually be multiple people on a work item, and there may be other ancillary costs.

Key Concepts

1. Net Present Value and Discounted Cash Flow

The concept behind the Time Value of Money is to calibrate some amount of money in the future to the present value of money. The idea is that a certain amount of money today is worth more in the future. This is because this money can be invested at some rate of return, which gives you returns in the future. Hence, receiving money earlier is better than receiving it late (because you can invest it right now). Similarly, spending money later is better than spending it right now, because that unspent money can earn interest. If $r$ is the rate of return (sometimes also called the hurdle rate), then $P_0$ (the amount of money right now) and the equivalent amount of money $P_t$ after $t$ time periods are related as:

\[P_0=\frac{P_t}{ {(1+r)}^t }\]

When making an investment, there are always projections of cash inflows and outflows upto some time in the future, in order to determine whether the investment is worth it. The sum of all of these cash flows (corrected to Net Present Values) minus the investment is a deciding factor of whether the investment was worth it; this is the Discounted Cash Flow, and is written as:

\[DCF(T)=\sum_{t=1}^T \frac{ CF(t)}{ {(1+r)}^t }\]

where $CF(t)$ is the cash flow at period $t$, and $r$ is the rate of return. Subtracting the investment from this value gives us the Net Present Value. If the NPV is positive, the investment is considered worth making, otherwise not.

2. Financial Derivative and Call Options

A Financial Derivative is a financial instrument (something which can be bought and sold) whose price depends upon the price some underlying financial object (henceforth called “underlying”). For simplification, assume that this underlying is a stock. Thus the price of a derivative depends upon the price of the stock.

A Call Option is a kind of financial derivative. There are different kinds of call options; for the purposes of this discussion, we will discuss American Call Options, and simply refer to it henceforth as “option”. The following are the characteristics of a call option (options in general, in fact):

The option is associated with a specific stock.
The option costs money to buy. This is called the Option Premium. This is almost always less than the price of the underlying stock.
The option has an expiry date.
The option can be exercised at any time before its expiry date.
The option has a strike price, which is fixed at the time of purchase of the option. If the option owner exercises the option, they can buy the underlying stock at the strike price, regardless of the price of the stock at that time on the financial market.

The idea is that we can pay a (relatively) small amount to fix the price of the stock for the lifetime of the option. If we choose to never exercise the option, the option lapses, and we have incurred a loss (because we paid for the option premium).

Let’s take a simple example. The current stock price is $100. Let there be an option to buy this stock, with a strike price of $100. The option premium is $10.
We buy one option, and thus pay $10.
A few days later, the stock price rises to $120. We exercise the option, and buy the stock for $100, which is the strike price. We pay $100. We have paid a total of $110 so far.
We immediately sell the stock to the market at the current stock price of $120.
We have thus earned $120-$110=$10.

Thus, options allow us to speculate on rising stocks. It is worth noting that there is the counterpart to the Call Option, which is the Put Option, which gives us the option to sell a stock at the specified strike price.

1. Articulating Value: Communicating Uncertainty and Risk in Estimation Models

Scenario: The team is asked to estimate a certain piece of work. The developers and analysts put together the usual RAIDs (Risks, Assumptions, Issues, Dependencies), and come up with a number (or, if they are slightly more sophisticated, they throw a minimum, most likely, and maximum value for each story). They end up adding up the maximum values to get an “upper bound”, do the same thing to the other two sets of estimates to get a total lower bound, and a total likely estimate. The analyst or the manager goes “This is too high!”. The developers go back to their estimates and start scrutinising the estimates, all in the hope of finding something they can reduce. Most of the time, they simply end up lowering some estimates (by fiat, or common agreement); this may be accompanied by a rational explanation or not: the latter is usually more common.

Happy with this number, the manager marches off to the client and shows off this estimate. The budget is approved; work commences. Then along comes the client all indignant: “We are not meeting the sprint commitments! The team is not moving fast enough!” Negotiations follow. No side ends up happy.

There are so many things wrong in the above picture; unfortunately, this can happen more often than not. What has happened here is a failure of communication; between the developers and the manager, and between the team and the client. One of the primary reasons for this is the false sense of accuracy and precision that comes with ending up with a single number, and the lack of tools to articulate the uncertainty behind this number. What does “upper bound” mean? Are you saying it will never go past this number?

If there is a clear way of communicating this uncertainty, the team can make an informed decision of what level of risk they are taking up when committing to a certain estimate. The client would certainly appreciate this, instead of receiving a single number which ends up being treated as an ironclad guarantee of the date of delivery.

Thankfully, we can communicate this uncertainty using some time-tested statistical tools.

Estimation Procedure using Confidence Levels

We assume that the estimate of a story is normally distributed. A potentially better candidate could be the log normal distribution, but let’s keep it simple for now.
When you throw an estimate, pick a range. This range is not simply an “upper bound” and “lower bound”, but it answers the question: “I’m 90% certain that it falls within $x$ and $y$“. We don’t bother with the most likely estimate in this scenario (it might matter if we are using something other than a Gaussian distribution, but let’s keep it simple).
Calculate the variance $\sigma$ given confidence interval of 0.9 (Z-score is correspondingly 1.65). Note that Confidence Interval is defined as $\hat{X} \pm Z.\sigma$.
Do this for each story.
Calculate the joint probability distribution of all the random variables (one per story). This is easy if we assume all the estimate distributions are Gaussian. If not, perform Monte Carlo simulations. This will give you a new normal distribution that represents the aggregate of all your estimate distributions.
Pick a range of estimates based on an acceptable confidence level. Alternatively, pick an acceptable range of estimates, record the confidence level, and acknowledge the risk. Communicate this range and the confidence with the client.
Negotiation with the client (or within the team) should happen around acceptable levels of uncertainty levels, not on modifying story estimates to fit a particular target. As long as all parties acknowledge the risk level, the uncertainty is explicitly communicated and may preempt the client coming back disappointed because the recorded effort exceeded a single number.

See this spreadsheet for a sample calculation. In the diagram below, the normal distribution on the far right is the final distribution resulting from convolving all the story estimates (which are normal distributions themselves). The Y-axis has been scaled by 1000 for ease of visualisation.

As you can see, the attempt to find a naive lower and upper bounds by summing the lower and upper bounds gives us 210 and 385. In fact, it is misleading to call these simply lower and upper bounds. They are bounds, but in this case, we want to use the term 90% confidence level upper/lower bounds. This implies that the estimators are 90% sure that the estimates for the first story (for example) lies between 10 and 30. Using this metric and using proper convolution techniques yields these bounds as 270 and 324, which is different from the value of naive summation, and is the correct result. With more stories, the gap between the convolution approach and the naive summation increases. One point about the 90% confidence level: whether narrowing this uncertainty is worth it (without artificially manipulating numbers) is the subject of the discussion in Articulating Value: The Value of Measurement. However, the point is to not settle on a single number, but to always use a range of values. This, in itself, is not new. However, the upper and lower bounds are always taken as fixed, without any discussion around the risk involved in picking a lower estimate.

This is what the above calculation brings out. In this simplifying example, we have chosen the estimates to be normal distributions, to keep calculations simply. It could even be a fat-tailed distribution like a Log-Normal Distribution (to bias it towards higher estimates), but then we’d need to run Monte Carlo simulations to come up with the data. So, let’s keep it simple for now.

The correct approach of convolving the estimate sdistributions of all the stories results in the single normal distribution above. With this graph, we can answer questions like:

What are the upper and lower bounds with 90% confidence? 324 and 270, respectively, which is different from the result of naively summing the upper and lower bounds.
Suppose we want to use a lower estimate of the upper bound, say, 310; what then is the risk of being wrong? The answer is 23%, which you can calculate for yourself by going to the spreadsheet directly.

The idea is that you can now communicate risk in your estimates, in the form of risk exposure. This is done by finding the expected differential between your upper bound and the overshoot value of the normal distribution from the probability at the upper bound to $\infty$. In this case, risk exposure communicates how much extra time (and consequently, money) will need to be expended, if the estimate overshoots 310 (assuming the budget was allotted only for 310).

The risk exposure curve for the above scenario is shown below:

Interesting Note: The IEEE-CS/ACM Software Engineering Code of Ethics and Professional Practices requires software professionals to quote uncertainties along with their estimates.

2. Articulating Value: The Value of Timing (aka, Real Options)

Real Options

Competent Architects and Engineers identify Real Options. Good Architects and Engineers create Real Options.

We have already talked about options earlier. Here we talk about Real Options, which are the strategic equivalent of Call Options. Most of the characteristics remain the same; however, real options are not traded on financial markets, but are used as a tool to optimise investments. We will delve into some of its possible applications in architectural decision-making and technical debt repayment, by way of example. Specifically, the YAGNI principle derives from the Real Options approach. See the following references for excellent discussions on the topic:

Software Design Decisions as Real Options
The Software Architect Elevator
Chapter 4 of Extreme Programming Perspectives
Chapter 3 of Value-Based Software Engineering

Here is an example. Let us assume that we have an Architecture Decision that we’d like to implement. The investment to implement this is 70. We project the following probabilities:

30% chance that the change will result in savings of 45 (in the current legacy process) per month for the next 3 months
40% chance that the change will result in savings of 30 (in the current legacy process) per month for the next 3 months
30% chance that the change will result in savings of 15 (in the current legacy process) per month for the next 3 months

Furthermore, we have determined that the Risk-Free Rate of Interest and the Risk Interest Rate are 6% and 10%, respectively. These will be used to calculate the Discounted Cash Flows.

The two scenarios are presented in this spreadsheet.

We see that the Expected Net Present Value is 3.9. This is a positive cash flow, so we might be tempted to implement the architecture decision right now. However, consider the risk. There is a 30% chance that the investment will be more than the savings and that we will end up with a negative cash flow of 25.

Let us assume that we wait a month to gather more data or more importantly, run a spike to validate that this architecture will pan out to give us the desired savings. How much should we invest into the spike? Usually, spikes are timeboxed, but for larger architecture decisions, we can also put a economic upper bound on investment we want to make in the spike.

The second set of calculations above show the second scenario of waiting a month. We see that if we can eliminate the uncertainty of incurring a loss (i.e., the [30%,15] scenario), the Net Present Value of the endeavour comes to 11.33. This is much higher than the NPV of the first scenario. This implies that waiting for one month doing the spike, and then making a decision is more valuable.

More importantly, this value of 11.33 gives us the Option Premium, which is the maximum value we’d like to pay in order to eliminate this uncertainty of loss. Note that this number is much less than the investment we’d have to make. Essentially, we are paying the price of eliminating uncertainty, and we’d like to make sure that this price is not too high.

Incidentally, the above calculations use the Datar-Matthews, because its parameters are more easily estimatable, but it also gives the same results as the famous Black-Scholes Model, which is used to price derivatives in financial markets.

Examples

One example where we could have applied: The team had built a data engineering pipeline using Spark and Scala. The stakeholder felt that hiring developers with the requisite skillsets would be hard, and wanted to move to plain Java-based processing. A combination of cash flow modeling and buying the option of redesign would have probably made for a compelling case.

So, to reiterate: real options are valuable because they allow us to make smaller investments to eliminate uncertainty on the return on investment for a large investment, without actually making that investment immediately, but deferring it. The value comes from deciding whether to defer this investment or not, whether this investment is implementing an architectural decision, or repaying tech debt. In many situations, the Real Option Premium is effectively zero, which means we don’t really need to do anything, but can just wait for more information on whether the investment seems worthwhile or not.

More philosophically, every line of code we write is an investment that we are making right now: an investment which might be worth delaying. Articulating this value concretely between engineers grounds a lot of discussions on what is really valuable to stakeholders, and will preempt a lot of bike-shedding.

3. Articulating Value: Economics and Risks of Tech Debt and Architectural Decisions

Here is some research relating Development Metrics to Wasted Development Time:

ATD must have cost=principal (amount to pay to implement) + interest (continuing incurred costs of not implementing ATD)

The following is an example of how a cash flow of an architectural decision might look like.

graph LR; architecture_decision[Architecture Decision]-->atd_principal[Cost of Architectural Decision: Principal]; architecture_decision-->recurring_atd_interest[Recurring Costs: Interest]; architecture_decision-->recurring_atd_savings[Recurring Development Savings]; architecture_decision-->atd_option_premium[Architecture Option Premium]; style architecture_decision fill:#006fff,stroke:#000,stroke-width:2px,color:#fff

Incorporating economics into daily architectural thinking

Here are some generic tips.

Practise drawing causal graphs. Complete the trace all the way up to where the perceived benefit is (money) is. It may be tempting to stop if you reach a DORA metric. Don’t; get to the money.
If you are already measuring DORA metrics, relentlessly ask what each DORA metric translates to in terms of money.
Along the way of the graph, list out other incidental cash outflows.
Build an option tree. Deduce whether it is better to defer execution, or do it right now. See Articulating Value (The Value of Timing, aka Real Options) for guidance on this.
Examples of architectural options are (see Articulating Value: The Value of Timing):
- Architecture Seams in Monoliths
- Spikes
- Simply waiting (YAGNI - You Aren’t Gonna Need It)
Non-technical things can also be calculated, i.e., the need for training.
These metrics must be measured as part of standardised project protocols.

Here are some tips for specific but standard cases.

1. The Economics of Microservices

If you are suggesting a new microservice for processing payments, these might be the new cash flows, as an example:

Recurring Cash Flows
- Transactions: New cash inflow
- Cost of recovering the whole system back from failure: Reduced cash outflow
- Cost of cloud resources to scale the new microservice: New cash outflow
- Cost of higher latency leading to lower service capacity (if the microservice is part of a workflow): Decreased cash inflow, depending upon if you ever reach the load limits of the service before other parts of the system start to fail
- Cost of fixing bugs: New cash outflow, depending upon complexity of the microservice
- Cost of Integrations:
Single or Few-Time Cash Flows
- Cost of development: New cash outflow
- Cost of deployment setup: New cash outflow (ideally should be as low as possible)
Option Premium
- Architecture Seam (see Articulating Value: The Value of Timing)

graph LR; microservice[Microservice ADR]-->database[Cloud DB Resources]; microservice-->hosting[Cloud Hosting Resources]; microservice-->development_cost[Development Cost]; microservice-->latency[Latency]; microservice-->bugs[Fixing bugs]-->bugfix_time[Wasted Bugfix Time Costs]; microservice-->downtime[Downtime]-->lost_transactions[Lost Transaction Costs]; microservice-->microservice_option_premium[Architecture Seam: Option Premium]; style microservice fill:#8f0f00,stroke:#000,stroke-width:2px,color:#fff

2. The Economics of Technical Debt repayment

Recurring Cash Flows
- Cost of Manual Troubleshooting and Resolution
- Cost of recurring change to a specific module
Single or Few-Time Cash Flows
- Cost of repaying tech debt
Option Premium
- The cost of isolating the effect of the technical debt from affecting other code (see Articulating Value: The Value of Timing)

The following is an example of how a value tree of a (general) Tech Debt might look like.

graph LR; debt[Tech Debt]-->principal[Cost of Fixing Debt: Principal]; debt-->interest[Recurring Cost: Interest]; debt-->td_option_premium[Tech Debt Option Premium]; debt-->risk[Risk-Related Cost, eg, Security Breach]; style debt fill:#006f00,stroke:#000,stroke-width:2px,color:#fff

Example Tech Debt Cash Flow

We see options thinking happening on projects to some degree; however they are either not explicitly articulated, nor are the value of these decisions explicitly communicated to stakeholders. This is corroborated in the paper How Do Real Options Concepts Fit in Agile Requirements Engineering?, where they attempt to answer some research questions. The ones of interest are listed below, and the relevant excerpts are quoted from this paper.

Research Question: What is the level of agile software organizations’ awareness of using options thinking in support of agile requirements reprioritization?

“Both the literature sources and the case study showed that there is awareness in the organizations, and that they apply option thinking for making mid-course project decisions, both from clients and developers perspective. Although the agile companies propagate development process driven only by value creation for the client, we observed that in practice option thinking is intrinsic for the developers as well. They consider trade offs between quality and schedule…”

Research Question: In which way does options-thinking add value?

” In the searched literature, we could not find a case where options are explicitly documented and compared in terms of value or in other quantitative way.”

Research Question: Which aspects of using options thinking in agile RE can be recognized as topics for future research?

“…we found that the options thinking is mostly described in terms of how it works for developers’ organizations. The perspective of the clients’ organizations seems under-researched.”

“We must note that in the literature, we found instances of using options-thinking which represent anecdotic experiences of either agile consultants or agile-practiceadopting organizations. We were really surprised that we couldn’t find a more substantive evidence that could be used to answer our research questions.”

“In both the literature review and the case study, we found that options are not expressed in quantitative terms. This finding makes us think that it may not be realistic at all to expect agile teams to reason about options quantitatively. Whether this is the case or not is a line for future research.”

(I emphatically contend that quantification of value in concrete economic terms should be one of the building blocks of articulating this value.)

4. Articulating Value: Deriving Value in Legacy Modernisation

Legacy Modernisation is an involved beast, and usually there are far too many variables to create an exhaustive model. However, a candidate cost model is a starter. We’ll write more about this going forward.

$C_{HW}$ = Cost of Hardware / Hosting
$C_{HUF}$ = Cost of manual work equivalent of feature (if completely new feature or if feature has manual interventions)
$C_{RED}$ = Cost of recovery, including human investments (related to MTTR)
$C_{LBD}$ = Cost of lost business / productivity during downtime (related to MTTR)
$C_{ENF}$ = Cost of development of an enhancement to a feature (related to DORA Lead Time)
$C_{NUF}$ = Cost of development of a new feature (related to DORA Lead Time)
$C_{BUG}$ = Cost of bug fixes for feature
$n_D$ = Number of downtime incidents per year
$n_E$ = Number of enhancements to feature per year
$n_B$ = Number of bugs in feature per year

The cost of a feature is then denoted by $V$, and the total value of the feature is $V_{total}$. These are given by:

\[V=C_{HUF} + n_D.(C_{RED} + C_{LBD}) + n_E.C_{ENF} + n_B.C_{BUG} \\ V_{legacy} = \sum_{i} V_i + C_{HW} + n_F.C_{NUF}\]

In legacy modernisation, the idea is to minimise $V_{legacy}$, so that $V_{legacy}-V_{modern} > 0$. Retention of customer base is also a valid use case, which we will touch upon in sequels.

5. Articulating Value: The Value of Metrics (aka, the Cost of Information)

For a metric to have economic value, it must support a decision. Examples of decisions are:

The investment will either be made or not. Alternatively, the amount of investment will be more or less.
Teams will be restructured or not.
A feature will go live or not.
A system (or subsystem) will be modernised or not.
A system will be either bought or built in-house.

Characteristics of a Decision

Must have 2 or more realistic alternatives. These alternatives cannot be recursive, i.e., the decision based on a certain measurement should not be to take action to modify that measurement.
A decision has uncertainty.
A decision has potentially negative consequences.
A decision must have a decision maker.

Quantify the Decision Model. The Decision Model will probably have multiple variables.

We need to decide what is the importance of these variables in making the decision. If a measurement has zero information value, then it is not worth measuring. When multiple variables are involved, use the EVPI metric coupled with Monte Carlo simulations (assuming the decision model has been quantified) to decide on the most important metrics.

Before we get into the nitty-gritties of how to actually measure this, let’s talk about the chain of value where we trace a metric to its value to the business decision it facilitates.

In general, any metric’s value tree should encapsulate (most of) the following elements.

graph LR; metric[Metric]-->speed[Speed]-->time_to_market[Time to Market]-->first_mover_fast_follower[First Mover/Fast Follower Economic Advantage] time_to_market-->time_value[Time Value of Savings/Profits] first_mover_fast_follower-->|No|no_invest[Don't Invest] first_mover_fast_follower-->|Yes|invest[Invest] time_value-->|Low|less_invest[Invest Less] time_value-->|High|more_invest[Invest More]

It is also important to note that a single metric does not contribute to the speed effect. Other factors like development effort are key input factors in custom software development. Let’s speak of the values which a metric can be traced to.

First Mover/Fast Follower Economic Advantage: The advantage gained by getting to market first with a novel product or feature is not to be underestimated. This is the First Mover Advantage. However, the First Mover Advantage has been disputed with the proposition that the Second Mover / Fast Follower Advantage may be significantly less riskier, and as profitable, if note more. Regardless of debate in this area, speed plays a key contribution in gaining this advantage.
Time Value of Savings/Profits: The value of speed not only lies in a first mover advantage. Even if we discount such an advantage, we can see that a savings (or profit) made earlier is always more valuable than the same amount gained at a later point in time, as we noted in Key Concepts. Essentially, the later the client starts seeing the profits/savings, the more money they are losing. At the risk of repeating the concept, this is because the savings or profits made right now could be invested and gaining returns from that interest.

The Economics of DORA Metrics

What business decisions do DORA metrics support? We can follow the above value tree, and see that they fit in very well with the template.

Deployment Frequency is a proxy for speed of feature development, which is itself a proxy for time to market.
Lead Time for Changes is a proxy for speed of feature development, which is itself a proxy for time to market.
Mean Time to Recovery is a metric for financial loss during downtime.
Change Failure Rate is a proxy for speed of development of features, which is itself a proxy for time to market.

This is an example value tree for DORA metrics.

graph LR; df[Deployment Frequency]-->speed[Speed]-->time_to_market[Time to Market]-->first_mover_fast_follower[First Mover/Fast Follower Economic Advantage] mlt[Lead Time for Changes]-->speed cfl[Change Failure Rate]-->bugs[Bugs]-->speed time_to_market-->time_value[Time Value of Savings/Profits] first_mover_fast_follower-->|No|no_invest[Don't Invest] first_mover_fast_follower-->|Yes|invest[Invest] time_value-->|Low|less_invest[Invest Less] time_value-->|High|more_invest[Invest More]

There are a lot more concepts that I’d like to cover, including:

Expected Value of Perfect Information
Possible procedures for determining the value of a metric
When is a metric’s performance good enough?
Value Tree Repository
The Cost of Unreleased Software

I will continue adding more information on the topic of the value of metrics going forward. Stay tuned.

References

Books
Papers
- How Do Real Options Concepts Fit in Agile Requirements Engineering?
- Making Architecture Design Decisions: An Economic Approach describes a pilot study of a modified CBAM approach applied at NASA.
- Software Design Decisions as Real Options
- A Practical Method for Valuing Real Options: The Boeing Approach describes the Datar-Matthews approach used in the real options example in this article.
- Code Red: The Business Impact of Code Quality - A Quantitative Study of 39 Proprietary Production Codebases
- The financial aspect of managing technical debt: A systematic literature review
- The Pricey Bill of Technical Debt: When and by Whom will it be Paid?
- Software Risk Management: Principles and Practices
- Generalization of an integrated cost model and extensions to COTS, PLE and TTM
Web

Transformers using PyTorch : Worklog Part 2

2023-01-14T00:00:00+05:30

We continue looking at the Transformer architecture from where we left from Part 1. When we’d stopped, we’d set up the Encoder stack, but had stopped short of adding positional encoding, and starting work on the Decoder stack. In this post, we will focus on setting up the training cycle.

Specifically, we will cover:

Positional Encoding
Decoder stack, including the masked multi-head attention mechanism
Set up the basic training regime via Teacher Forcing

We will also lay out the dimensional analysis a little more clearly, and add necessary unit tests to verify intended functionality. The code is available here.

Positional Encoding

You can see the code for visualising the positional encoding here. Both images below show the encoding map at different levels of zoom.

The code in the main Transformer implementation which implements the positional embedding is shown below.

    # The encoder output is injected directly into the sublayer of every Decoder. To build up the chain of Decoders
    # in PyTorch, so that we can put the full stack inside a Sequential block, we simply inject the encoder output
    # to the root Decoder, and have it output the encoder output (together with the actual Decoder output) as part of
    # the Decoder's actual output to make it easy for the next Decoder in the stack to consume the Encoder and Decoder
    # outputs
    def forward(self, input):
        encoder_output, previous_stage_output = input
        masked_mh_output = self.masked_multiheaded_attention_layer(
            self.masked_qkv_source.forward(previous_stage_output))
        input_qkv = self.unmasked_qkv_source.forward((encoder_output, masked_mh_output))
        mh_output = self.multiheaded_attention_layer(input_qkv)
        # Adds the residual connection to the output of the attention layer
        layer_normed_multihead_output = self.layer_norm(mh_output + previous_stage_output)
        ffnn_outputs = torch.stack(
            list(map(lambda attention_vector: self.feedforward_layer(attention_vector), layer_normed_multihead_output)))
        layer_normed_ffnn_output = self.layer_norm(ffnn_outputs + layer_normed_multihead_output)
        return (encoder_output, layer_normed_ffnn_output)

Data Flow

The diagram below (you’ll need to zoom in) shows the data flow for a single Encoder/Decoder, with 8 attention blocks per multihead attention layer. $n$ represents the number of words passed into the Encoder. $m$ represents the number of words passed into the Decoder. $V$ represents the length of the full vocabulary.

The dimensions of the data at each stage are depicted to facilitate understanding.

graph LR; subgraph Encoder encoder_src[Source Text]--nx512-->pos_encoding[Positional Encoding]; pos_encoding--nx512-->qkv_encoder[QKV Layer] qkv_encoder--Q=nx64-->multihead_attn_1[Attention 1] qkv_encoder--K=nx64-->multihead_attn_1 qkv_encoder--V=nx64-->multihead_attn_1 qkv_encoder--Q=nx64-->multihead_attn_2[Attention 2] qkv_encoder--K=nx64-->multihead_attn_2 qkv_encoder--V=nx64-->multihead_attn_2 qkv_encoder--Q=nx64-->multihead_attn_3[Attention 3] qkv_encoder--K=nx64-->multihead_attn_3 qkv_encoder--V=nx64-->multihead_attn_3 qkv_encoder--Q=nx64-->multihead_attn_4[Attention 4] qkv_encoder--K=nx64-->multihead_attn_4 qkv_encoder--V=nx64-->multihead_attn_4 qkv_encoder--Q=nx64-->multihead_attn_5[Attention 5] qkv_encoder--K=nx64-->multihead_attn_5 qkv_encoder--V=nx64-->multihead_attn_5 qkv_encoder--Q=nx64-->multihead_attn_6[Attention 6] qkv_encoder--K=nx64-->multihead_attn_6 qkv_encoder--V=nx64-->multihead_attn_6 qkv_encoder--Q=nx64-->multihead_attn_7[Attention 7] qkv_encoder--K=nx64-->multihead_attn_7 qkv_encoder--V=nx64-->multihead_attn_7 qkv_encoder--Q=nx64-->multihead_attn_8[Attention 8] qkv_encoder--K=nx64-->multihead_attn_8 qkv_encoder--V=nx64-->multihead_attn_8 subgraph EncoderMultiheadAttention[Encoder Multihead Attention] multihead_attn_1--nx64-->concat((Concatenate)) multihead_attn_2--nx64-->concat multihead_attn_3--nx64-->concat multihead_attn_4--nx64-->concat multihead_attn_5--nx64-->concat multihead_attn_6--nx64-->concat multihead_attn_7--nx64-->concat multihead_attn_8--nx64-->concat end concat--nx512-->linear_reproject[Linear Reprojection] linear_reproject--1x512-->ffnn_encoder_1[FFNN 1] linear_reproject--1x512-->ffnn_encoder_2[FFNN 2] linear_reproject--1x512-->ffnn_encoder_t[FFNN x] linear_reproject--1x512-->ffnn_encoder_n[FFNN n] subgraph FfnnEncoder[Feed Forward Neural Network] ffnn_encoder_1--1x512-->stack_encoder((Stack)) ffnn_encoder_2--1x512-->stack_encoder ffnn_encoder_t--1x512-->stack_encoder ffnn_encoder_n--1x512-->stack_encoder end stack_encoder--nx512-->encoder_output[Encoder Output] end subgraph Decoder decoder_target[Decoder Target]--mx512-->pos_encoding_2[Positional Encoding] pos_encoding_2--mx512-->qkv_decoder_1[QKV Layer] qkv_decoder_1--Q=mx64-->multihead_attn_masked_1[Attention 1] qkv_decoder_1--K=mx64-->multihead_attn_masked_1 qkv_decoder_1--V=mx64-->multihead_attn_masked_1 qkv_decoder_1--Q=mx64-->multihead_attn_masked_2[Attention 2] qkv_decoder_1--K=mx64-->multihead_attn_masked_2 qkv_decoder_1--V=mx64-->multihead_attn_masked_2 qkv_decoder_1--Q=mx64-->multihead_attn_masked_3[Attention 3] qkv_decoder_1--K=mx64-->multihead_attn_masked_3 qkv_decoder_1--V=mx64-->multihead_attn_masked_3 qkv_decoder_1--Q=mx64-->multihead_attn_masked_4[Attention 4] qkv_decoder_1--K=mx64-->multihead_attn_masked_4 qkv_decoder_1--V=mx64-->multihead_attn_masked_4 qkv_decoder_1--Q=mx64-->multihead_attn_masked_5[Attention 5] qkv_decoder_1--K=mx64-->multihead_attn_masked_5 qkv_decoder_1--V=mx64-->multihead_attn_masked_5 qkv_decoder_1--Q=mx64-->multihead_attn_masked_6[Attention 6] qkv_decoder_1--K=mx64-->multihead_attn_masked_6 qkv_decoder_1--V=mx64-->multihead_attn_masked_6 qkv_decoder_1--Q=mx64-->multihead_attn_masked_7[Attention 7] qkv_decoder_1--K=mx64-->multihead_attn_masked_7 qkv_decoder_1--V=mx64-->multihead_attn_masked_7 qkv_decoder_1--Q=mx64-->multihead_attn_masked_8[Attention 8] qkv_decoder_1--K=mx64-->multihead_attn_masked_8 qkv_decoder_1--V=mx64-->multihead_attn_masked_8 subgraph DecoderMaskedMultiheadAttention[Decoder Masked Multihead Attention] multihead_attn_masked_1--mx64-->concat_masked((Concatenate)) multihead_attn_masked_2--mx64-->concat_masked multihead_attn_masked_3--mx64-->concat_masked multihead_attn_masked_4--mx64-->concat_masked multihead_attn_masked_5--mx64-->concat_masked multihead_attn_masked_6--mx64-->concat_masked multihead_attn_masked_7--mx64-->concat_masked multihead_attn_masked_8--mx64-->concat_masked end concat_masked--mx512-->linear_reproject_masked[Linear Reprojection] linear_reproject_masked--1x512-->ffnn_encoder_1_masked[FFNN 1] linear_reproject_masked--1x512-->ffnn_encoder_2_masked[FFNN 2] linear_reproject_masked--1x512-->ffnn_encoder_t_masked[FFNN x] linear_reproject_masked--1x512-->ffnn_encoder_n_masked[FFNN n] subgraph FfnnEncoderMasked[Feed Forward Neural Network] ffnn_encoder_1_masked--1x512-->stack_decoder_masked((Stack)) ffnn_encoder_2_masked--1x512-->stack_decoder_masked ffnn_encoder_t_masked--1x512-->stack_decoder_masked ffnn_encoder_n_masked--1x512-->stack_decoder_masked end stack_decoder_masked--mx512-->query_project[Query Projection] encoder_output--nx512-->kv_project_decoder[Key-Value Projection] query_project--Q=mx64-->multihead_attn_unmasked_1[Attention 1] kv_project_decoder--K=nx64-->multihead_attn_unmasked_1 kv_project_decoder--V=nx64-->multihead_attn_unmasked_1 query_project--Q=mx64-->multihead_attn_unmasked_2[Attention 2] kv_project_decoder--K=nx64-->multihead_attn_unmasked_2 kv_project_decoder--V=nx64-->multihead_attn_unmasked_2 query_project--Q=mx64-->multihead_attn_unmasked_3[Attention 3] kv_project_decoder--K=nx64-->multihead_attn_unmasked_3 kv_project_decoder--V=nx64-->multihead_attn_unmasked_3 query_project--Q=mx64-->multihead_attn_unmasked_4[Attention 4] kv_project_decoder--K=nx64-->multihead_attn_unmasked_4 kv_project_decoder--V=nx64-->multihead_attn_unmasked_4 query_project--Q=mx64-->multihead_attn_unmasked_5[Attention 5] kv_project_decoder--K=nx64-->multihead_attn_unmasked_5 kv_project_decoder--V=nx64-->multihead_attn_unmasked_5 query_project--Q=mx64-->multihead_attn_unmasked_6[Attention 6] kv_project_decoder--K=nx64-->multihead_attn_unmasked_6 kv_project_decoder--V=nx64-->multihead_attn_unmasked_6 query_project--Q=mx64-->multihead_attn_unmasked_7[Attention 7] kv_project_decoder--K=nx64-->multihead_attn_unmasked_7 kv_project_decoder--V=nx64-->multihead_attn_unmasked_7 query_project--Q=mx64-->multihead_attn_unmasked_8[Multihead Attention 8] kv_project_decoder--K=nx64-->multihead_attn_unmasked_8 kv_project_decoder--V=nx64-->multihead_attn_unmasked_8 subgraph DecoderUnmaskedMultiheadAttention[Decoder Unmasked Multihead Attention] multihead_attn_unmasked_1--mx64-->concat_unmasked((Concatenate)) multihead_attn_unmasked_2--mx64-->concat_unmasked multihead_attn_unmasked_3--mx64-->concat_unmasked multihead_attn_unmasked_4--mx64-->concat_unmasked multihead_attn_unmasked_5--mx64-->concat_unmasked multihead_attn_unmasked_6--mx64-->concat_unmasked multihead_attn_unmasked_7--mx64-->concat_unmasked multihead_attn_unmasked_8--mx64-->concat_unmasked end concat_unmasked--mx512-->linear_reproject_unmasked[Linear Reprojection] linear_reproject_unmasked--1x512-->ffnn_decoder_1_unmasked[FFNN 1] linear_reproject_unmasked--1x512-->ffnn_decoder_2_unmasked[FFNN 2] linear_reproject_unmasked--1x512-->ffnn_decoder_t_unmasked[FFNN x] linear_reproject_unmasked--1x512-->ffnn_decoder_n_unmasked[FFNN n] subgraph FfnnDecoderUnmasked[Feed Forward Neural Networks] ffnn_decoder_1_unmasked--1x512-->stack_decoder_unmasked((Stack)) ffnn_decoder_2_unmasked--1x512-->stack_decoder_unmasked ffnn_decoder_t_unmasked--1x512-->stack_decoder_unmasked ffnn_decoder_n_unmasked--1x512-->stack_decoder_unmasked end end stack_decoder_unmasked--mx512-->linear[Linear=512xV] subgraph OutputLayer[Output Layer] linear--mxV-->softmax[Softmax] softmax--mxV-->select_max_probabilities[Select Maximum Probability Token for each Position] select_max_probabilities--1xm-->transformer_output[Transformer Output] end

Notes on the Code

The last word in the output is added to the output buffer, during inference.
The encoder output is injected directly into the sublayer of every Decoder. To build up the chain of Decoders in PyTorch, so that we can put the full stack inside a Sequential block, we simply inject the encoder output to the root Decoder, and have it output the encoder output (together with the actual Decoder output) as part of the Decoder’s actual output to make it easy for the next Decoder in the stack to consume the Encoder and Decoder outputs.
The code does not set up parameters in a form suitable for optimisation yet. There are several Module-subclasses which are really only there for the convenience of not having to call the forward() methods explicitly. In a sequel, we will collapse most of the parameters to be part of only a couple of Module subclasses.

The class diagram is shown above. The composition hierarchy is quite straightforward, though there are some associations missing because of the shortcomings of the tool used to generate this (Pyreverse).
- Specifically, DecoderStack contains a bunch of Decoders, and EncoderStack contains a bunch of Encoders.
- qkv_source and masked_qkv_source contain instances of SingleSourceQKVLayer.
- unmasked_qkv_source contains an instance of MultiSourceQKVLayer.
- Some of the members in the classes are repeated because of Pyreverse duplicating the information from type hints.
More notes can be found in the source itself.

Conclusion

We have built and tested the basic Transformer architecture. However, we still need to do the following:

Build a proper vocabulary. Our current vocabulary is hard-coded, and contains random vectors.
Several tensors are reused as parameters. Some of these need to be separate parameters.
There are several Module subclasses. For optimisation, we will need to centralise where we register our parameters.
We still need to train the Transformer.

All of the above, we will work on in the sequel to this post.

References

A Tale of Unintentional Learning

2023-01-11T00:00:00+05:30

TL;DR: I went to using Vim once in a year to using it everyday by accident when I got into a flow mindset after the effort of understanding a Machine Learning paper. It feels like a miracle.

I never intended on learning Vim. I was doing fine without it. I don’t even think it was anywhere near the top of my list of things to learn this year. Nor next year. Nor…well, you get the point. Truth is: I’ve only had to use Vim in some exceptional circumstances; that too not because it was the only editor available, but because I was too lazy to change my editor to something else when amending Git commits. Thus, I knew the bare minimum. In this context, “bare minimum” equals knowing how to exit Vim (yes, I know it’s :wq; get off my back!). The rest was about as uncivilised as you could expect: going into Insert mode and navigating using the arrow keys (horror of horrors!). It was slow, felt like pulling teeth, and I was just not motivated to learn it well enough to use it half-decently.

Well, that’s not quite true. I had tried giving Vim a serious try over the last decade or so. Every so often, I’d fire up Vim and think to myself: “Today’s the day!”. I’d open up OpenVim or vimtutor and start doing a few exercises. This lasted about…10 minutes tops, before I either got distracted by the next shiny thing, and would abandon the endeavour. In fact, in retrospect, I am pretty sure that at the back of my mind, I was looking for any excuse to abandon my efforts. It didn’t necessarily feel like an uphill battle; the value system in my brain simply complained that I could be learning other things, more useful things, and at some point, it won out.

Truth be told, I am not a fast typist. I cannot touch-type, and while programming, I’d be lost without my IntelliJ IDEA shortcuts. I suppose that is a saving grace; I avoid using the mouse as much as possible when coding, thanks to some good habits drilled into me by an amazing mentor at the start of my career (thanks Fred!). But still, everywhere else, it’s been a combination of arrow keys, Page Up/Down, and a lot of eye-rolling.

Now comes the good bit, or at least what I consider the interesting part.

I had been working on implementing a Machine Learning paper for the past couple of weeks (this one, in case anyone is interested). As part of my learning process, I was documenting the different concepts I was learning, under the firm belief that trying to explain it to an audience would expose holes in my understanding (the full series starts from here). The details are not too important, except that I was nearing the end of the series of blog posts I was writing. It was 3 am and I was about to write the last few paragraphs before heading to bed. Which was when a particular thought occurred.

The thought was: “Wouldn’t it be cool if I could write the rest of this in Vim?”

To this day, I will not be able to fathom why this particular thought occurred to me right there, right then, right in that mental state. There was no particular rhyme or reason why it came to me; I just needed to finish the post, proof-read it, and then publish it to my blog, and be done for the day (night?). Maybe all those failed attempts at learning Vim over the years had bred a sort of fierce combination of regret and longing; maybe it was something else. I will never know.

I launched Vim, and started editing the file, resolving to not touch the arrow keys, no matter how long it took me to know the Vim way of navigating. Alright, maybe I wasn’t that harsh on myself; I did use the arrow keys but not as a tool for navigating wide swathes of text.

Thought process follows:

I need to get to the end of the file.
Google “go to end of file in vim”. Ahh, now how do I start editing? Well, I know that one, it’s i.
But no, it brings the cursor just before the last character.
Google “edit after last character”. Try it out a few times. Hey, this is pretty cool.
…

You know where this is going. The above sequence repeated itself many times during the course of the next hour or so. Now, the point I wanted to highlight is not the above sequence of events that I went through. No.

It is the fact that picking up and reusing the commands felt effortless.

Now, I will be the first to admit that there was no Matrix-like “I know kung-fu” moment. But, if I had to analyse my mental state at that point, in retrospect, there were two things I recall about it.

I was exhausted. Not exhausted in the sense of drowsy and drained; more along the lines of feeling like a rubber band which had stayed stretched for too long because of all the learning I’d been doing recently, including that night.
I was still in a state of learning. I was in a state of flow. It was very probably a side effect of all the effort I’d expended in understanding new concepts in the paper I was implementing.

Putting the two things together, the best analysis I can come up with is this: I was in the state where anything that I came across, I could start picking it up without too much of kickstarting my brain to motivate itself to learn.

Am I a Vim ninja now? Heck, no. Do I still use arrow keys sometimes? Oh yes. Am I using Vim a lot? Yes and no. I’m still using IntelliJ IDEA for my work, but I’ve enabled Vim mode in its editor windows. For notes and stuff at work, I’ve switched to NeoVim full-time, having abandoned Sublime Text.

But I do try to put the Vim philosphy to use every day as part of my editing…whether it’s text or code.

I will not add more about how enjoyable it is, and how I feel like going back to using other text editors is like devolving from using metal tools to using sticks and stones, except to say that it really is as good as enthusiasts and fanatics say it is.

I’m reasonably sure this could constitute a learning hack for me: to learn something you aren’t necessarily motivated to learn, learn something that you are naturally interested in. Once you are in that learning mode, learning the “less palatable” thing becomes just another thing to pick up.

Maybe, this was a one-off, a rare moment of inspiration. Who knows?

But I sure intend to try it again, and see how it works out :-)

Tests increase our Knowledge of the System: A Proof from Probability

2023-01-10T00:00:00+05:30

Note: This is a post from July 13, 2011, rescued from my old blog. This is only for archival purposes, and is reproduced verbatim, but I make no claims about its rigour, though it does still seem plausible.

This was an old proof that was up on my old blog, but since I’m no longer posting to that, I’m reposting it here for posterity. Also, rewriting the equations in LaTeX, now that I have installed a plugin for that.

I present a simple mathematical device to prove that tests improve our understanding of code. It does not really matter if this is code written by the test author himself or is legacy. To do this, some simplification of the situation is necessary.

We assume $X$ is the unit of code under consideration. $X$ may be a function, a class or a compiled binary. The only restriction on $X$ is that it can accept inputs and produce measurable outputs.

Without loss of generality, we may assume that the $X$’s output consists of n bits. If complicated structures like objects are present in the result, they may simply be decomposed into bits and laid out in a convenient order to fit this model. This assumption exists to simplify quantizing the output space only.

We also assume that unique inputs yield unique outputs, but this assumption does not affect the fundamental conclusion.

Let us define the probabilities:

$P\left(A\right)$ = Probability that X uses the correct algorithm = p
$P(B)$ = Probability of test $T_1$ passing (getting the correct output) for a given input $I_1 = \frac{1}{2^n}$
$P(B|A)$ = Probability of test $T_1$ passing (getting the correct output) for a given input $I_1$, given $X$ uses the correct algorithm = 1

Therefore, using Bayes’ Theorem:

$P\left(A|B\right)$ = Probability that $X$ uses the correct algorithm given test passes for a given input $I_1$

We can thus write:

\[P\left(B|A\right).P\left(A\right)/P\left(B\right) = p.2^n \\ P\left(1\right)=p.2^n\]

Note that after writing one test, the probability of X using the correct algorithm has increased (n>=1) by a factor of $2^n$.

Let us now write another test with input $I_2$. Note that I assume that $T_1$ passing does not affect any probabilities other than the updated probability of $X$ using the correct algorithm, i.e., the tests are statistically independent (I believe that’s the term used :-) .

$P\left(A\right)$ = Probability that X uses the correct algorithm = $p.2^n$
$P\left(B\right)$ = Probability of test T2 passing (getting the correct output) for a given input $I_2$ = $\frac{1}{2^n}$
$P\left(B|A\right)$ = Probability of test $T_2$ passing (getting the correct output) for a given input $I_2$, given $X$ uses the correct algorithm = 1

Therefore, using Bayes’ Theorem:

$P\left(A|B\right)$ = Probability that $X$ uses the correct algorithm given test passes for a given input $I_2$

\[P(A|B)= P\left(B|A\right).P\left(A\right)/P\left(B\right) = p.2^n.2^n \\ P\left(2\right)=p.2^n.2^n\]

After having written $t$ tests, we may write:

\[P\left(t\right)=p.2^{nt}\]

$t$ tests, therefore, increase the probability (or our knowledge, in very rough terms) of X being implemented correctly, by a factor of 2^nt.

Probability is inversely correlated with entropy; thus, we have also reduced the entropy of the system. It might be useful to state that $I$ use the term ‘test’ in a broad sense. The test may range from an automated unit test to human verification.

It turns out that it should be possible to determine $p$. Note that $t$ is the number of statistically independent tests that we can write. This implies that $t$ has a fixed upper bound. Thus:

$T$ = total number of statistically independent tests for $X$
$p.2^{nT} <= 1$
$p <= \frac{1}{2^{nT}}$

A Pipeline for Adaptive Bitrate Video Encoding

2023-01-09T00:00:00+05:30

Note: This is a post from July 13, 2011, rescued from my old blog. This is only for archival purposes, and is reproduced verbatim, but is hopelessly outdated.

I’ve been working on something unusual lately, namely, building a pipeline for encoding video files into formats suitable for HTTP Live Streaming. The actual job of encoding into different formats at different bit rates and resolutions is done using a combination of ffmpeg and x264. To me, the interesting part lies in how we have tried to speed up the process, using the venerable Map-Reduce approach. Before I dive into the details, here’s a quick review of the basic idea of HLS.

Put very simply, adaptive streaming serves video content in multiple qualities, allowing the streaming client choice in selecting which quality to use depending upon the bandwidth constraint on the consumer side. This choice is not a one-time choice, depending upon the encode cut duration, the client can switch to higher or lower resolutions dynamically throughout the entire playback of the video stream. How is this accomplished?

Assume that you slice up a video into mulitple segments. Each of these segments can be as long or as short as you want; for argument’s sake, I shall assume that every segment lasts 10 seconds. Now, encode each of these segments at different levels of quality, say a low bit rate for mobile consumption, a high quality one for fat broadband connections, etc. What you ultimately get is many versions of each segment of video, each version of a different quality. What you ultimately publish on your server is essentially a list of the names of these segments. This is the playlist for a video encoded with adaptive streaming. When the client asks you to open a network stream, the URL you put in is that of the playlist. The situation then looks something like this.

The client ends up retrieving a master playlist which links to the quality-specific playlists that you see in the picture above. That’s more or less the general idea. One thing to note is that even though the stream is served in segments, the segments themselves do not have to necessarily be separate files. In fact, Microsoft’s take on adaptive streaming involves internally fragmenting a video file to generate a single .ismv file per quality. This .ismv file is internally fragmented, so even though it’s a single file in the filesystem, its content is served in segments. That is how we’ve chosen to serve these files. CodeShop has a tool called mp4split which generates these .ismv files. It can also generate playlists in different formats; for example, Apple’s .m3u8 format, Silverlight’s .ismc format and Flash’s .f4v format. Of course, simply having these files isn’t enough, since the the segments ‘reside’ inside the single .ismv file. The server needs to recognise requests for segments and extract those segments. For this purpose, we’ve used CodeShop’s Nginx module for serving these files; see here and here.

So we wanted to implement this. The only issue is that for a large video file, encoding takes a very long time. Encoding is a parallelisable task; you can partition a file into small segments and encode them independently. This is how our pipeline looks like:

Transcode Stage: This stage essentially transcodes the video from its original format (.avi/.mov, etc.) into an MP4 format using H264 and AAC encodings for video and audio, respectively. This is done using the x264 utility.

Split Stage: This splits the MP4 video into 2-second segments, and generates information which will be used by the adaptive streaming encoding processes to determine which quality to encode to.

Encode Stage:This is where the parallelism comes in. Basically, multiple daemons pick up assignments from a (Starling) queue, and encode the segments at the desired qualities, this is one daemon per segment per encoding level (quality).

Merge Stage: This is where the encoded segments are joined back to form a single .mp4 video file per encoding level.

Fragment and Generate Playlist: The video files are internally fragmented using mp4split, and the playlist files are generated.

Vim and TMux Commands Galore

2023-01-05T00:00:00+05:30

This short post lists the Neovim (Vim) shortcuts I am getting used to. I’ve recently switched to trying the Vim mode for my IDE needs, and having used Vim previously only for very simple tasks, am having a blast practising the basic Vim shortcuts. Ultimately, I will probably move to doing more IDE-related work in native Vim too.

I’ve also added TMux shortcuts because I’m learning to use that too.

Vim commands

u/: Undo/Redo
.: Repeat last command
w/b: Move forward/backward by a word
s refers to a sentence. Thus diw and daw deletes a sentence from anywhere inside it, and everything around the sentence, respectively.
(/): Jump to previous/next sentence
i/a/I/A: Start editing before/after cursor, before start/after end of line
- i: Considers whitespace as words too, so i2w selects a word and any whitespace after it.
- a: Considers word + whitespace as a text object, so a2w selects "text1 text2 ".
0/_/$: Go to starting character / starting non-whitespace character / end of line
d: Delete (suffix with counter and text object, like d2w, dd)
c: Change (suffix with counter and text object, like c2w, cc)
r: Replace (suffix with counter and text object, like r2w, rr)
y: Yank (suffix with counter and text object, like y2w, yy)
- Interesting Use Case: ny$ yanks from cursor to end of line n times, so n lines, starting from the current cursor position.
F-x/f-x: Find character x before/after
*: Search forward word under cursor
/ and ?: Find string forward/backward
P/p: Paste before/after cursor
x: Delete character under cursor
nG/ngg: Go to line number n
G: Go to end of file
i: “Everything inside” qualifier used in conjunction with other verbs, like diw, ci"
: Visual Block Mode, use I to insert en-masse
/: Go to old/new positions
/: Move up/down half a page
{/}: Jump forward/back across a contiguous block of text
+/-: Jump to start of next/previous line

Ex commands

x,yz: Defines an inclusive range of lines from x to y and performs with optional argument z.
- m/t: Move/Copy range of lines to after z. Example: 10,20m30. Single line variants like10m30 also work.
x;+/-n: Defines range of +/-n starting from line x
., .+/-n: Refers to the current line / Refers to n lines after/before current line.
$, $+/-n: Refers to the last line of the document (Compare to going to last character in line in Vim’s Normal mode). +/-n navigates n lines after/before last line.
%: Refers to all lines (same as 1,$)
/pattern/ and ?pattern?: Searches forward and backward for pattern. This can be used as a location argument in other commands.
:/:: Moves backwards/forwards through command history.

TMux commands

?: View all keybindings
%: Horizontal split
": Vertical split
: Moves across TMux panes
d: Detaches from current TMux session
[: Enables scroll mode
: Enables highlight mode after entering scroll mode. Press to yank highlighted text.
]: Pastes copied content to another TMux terminal
tmux ls: Lists running TMux sessions
tmux attach -t : Attaches to specified TMux session
tmux rename-session -t : Renames a TMux session
,: Renames current window
tmux new -s : Creates a new TMux session with given SessionID