ThoughtWorks Tech Radar : Mechanical Sympathy

My take on Mechanical Sympathy (from the ThoughtWorks Technology Radar), which I presented at the Sheraton Bangalore, is based off the content below.

“The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry.” – Henry Petroski

“Premature optimisation is the root of all evil.” – Donald Knuth

The Hibernian Express is the first transatlantic fiber-optic communications cable to be laid in 10 years, at a cost of $300 million. The current speed record for establishing transatlantic communication is 65 milliseconds. The Hibernian Express will reduce that. By all of 6 milliseconds. If that isn’t a very expensive optimisation, I do not know what is.

In almost all cases, the code that we write is abstracted away from the internals of the hardware. This is a desirable and necessary thing. However, particular domains require applications to operate under a set of exacting constraints. Recent interest in Ultra-Low Latency Trading in the HFT arena typically requires order volumes of over 5000 orders a second with order and execution report round trip times of 100 microseconds. In such cases, tailoring your architecture to handle concurrency is no longer an idle option, it is a necessity. Even for more prosaic applications, it is not uncommon to need low latency data structures.

Usually, requiring low latency boils down to minimising time spent in concurrency management with respect to actual logic processing. Today’s programming languages provide a variety of constructs to model concurrent operations. Locks, mutexes, memory barriers, to name a few. Even at the opcode level, you may use CAS operations, which are cheaper than locks. However, to move to the upper end of the curve, to get to really low latency, many designers eschew all of these constructs.

One good example is the Disruptor, which is a high performance concurrency framework for Java. In a series of excellent articles, Martin Thompson, one of the authors of the Disruptor framework, discusses techniques to reduce latency by write combining, writing lock free algorithms, and the Single Writer principle.

Even if lock contention is an issue, there are other ways of reducing latency. One example is when a team working to increase the performance of their custom JMS implementation, which wrote their custom implementation of the JDK Executor interface – the Executor interface is responsible for firing off Runnable jobs, by the way. This resulted in an improvement by a factor of 10.

One of the more explicit forms of mechanical sympathy is when you rewrite software to execute on specially designed hardware. GPUs and FPGAs are commonplace in financial computing.

Indirectly, this form of thinking also seems to have influenced the design of single-threaded servers with asynchronous I/O. In a multi-threaded server, you, or rather the server, are faced constantly with having to switch contexts between threads. With a single thread model, latency is greatly reduced.