Tag Archives: Technology Radar

ThoughtWorks Tech Radar : Mechanical Sympathy

My take on Mechanical Sympathy (from the ThoughtWorks Technology Radar), which I presented at the Sheraton Bangalore, is based off the content below.

“The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry.” – Henry Petroski

“Premature optimisation is the root of all evil.” – Donald Knuth

The Hibernian Express is the first transatlantic fiber-optic communications cable to be laid in 10 years, at a cost of $300 million. The current speed record for establishing transatlantic communication is 65 milliseconds. The Hibernian Express will reduce that. By all of 6 milliseconds. If that isn’t a very expensive optimisation, I do not know what is.

In almost all cases, the code that we write is abstracted away from the internals of the hardware. This is a desirable and necessary thing. However, particular domains require applications to operate under a set of exacting constraints. Recent interest in Ultra-Low Latency Trading in the HFT arena typically requires order volumes of over 5000 orders a second with order and execution report round trip times of 100 microseconds. In such cases, tailoring your architecture to handle concurrency is no longer an idle option, it is a necessity. Even for more prosaic applications, it is not uncommon to need low latency data structures.

Usually, requiring low latency boils down to minimising time spent in concurrency management with respect to actual logic processing. Today’s programming languages provide a variety of constructs to model concurrent operations. Locks, mutexes, memory barriers, to name a few. Even at the opcode level, you may use CAS operations, which are cheaper than locks. However, to move to the upper end of the curve, to get to really low latency, many designers eschew all of these constructs.

One good example is the Disruptor, which is a high performance concurrency framework for Java. In a series of excellent articles, Martin Thompson, one of the authors of the Disruptor framework, discusses techniques to reduce latency by write combining, writing lock free algorithms, and the Single Writer principle.

Even if lock contention is an issue, there are other ways of reducing latency. One example is when a team working to increase the performance of their custom JMS implementation, which wrote their custom implementation of the JDK Executor interface – the Executor interface is responsible for firing off Runnable jobs, by the way. This resulted in an improvement by a factor of 10.

One of the more explicit forms of mechanical sympathy is when you rewrite software to execute on specially designed hardware. GPUs and FPGAs are commonplace in financial computing.

Indirectly, this form of thinking also seems to have influenced the design of single-threaded servers with asynchronous I/O. In a multi-threaded server, you, or rather the server, are faced constantly with having to switch contexts between threads. With a single thread model, latency is greatly reduced.

ThoughtWorks Tech Radar : Agile Analytics

My take on Agile Analytics from the ThoughtWorks Technology Radar, which I presented at the Sheraton Bangalore today, is based off the following document.

Patient: Will I survive this risky operation?
Surgeon: Yes, I’m absolutely sure that you will survive the operation.
Patient: How can you be so sure?
Surgeon: Well, 9 out of 10 patients die in this operation, and yesterday my ninth patient died.

Andrew Lang, a Scottish writer and collector of folk tales, once remarked that many people use statistics as a drunken man uses lamp-posts…for support rather than illumination. Even so, we have come a long way from the 9th century, when Al Kindi used statistics to decipher encrypted messages and developed the first code breaking algorithm, in Baghdad – incidentally, he was instrumental in introducing the base 10 Indian numeral system to the Islamic and the Christian world.

1654 – Pascal and Fermat create the mathematical theory of probability,
1761 – Thomas Bayes proves Bayes’ theorem,
1948 – Shannon’s Mathematical Theory of Communication defines capacity of communication channels in terms of probabilities. Bit of a game changer, that one. All our designs of communication networks and error-correction algorithms stem from insights found in that work.

Today, we realise that the pace at which we collect data far exceed our capability to make sense of it. Data is everywhere, *literally*. The blood cells in your body trying to determine whether that molecule is an oxygen molecule or not? That is data. Your build breaking? That is data. You’re running a static analysis tool to check your test coverage? Yeah, that is data analysis.
Unfortunately, we are at that point where our opinions about whether a piece of data is relevant to analysis, form far too slowly. How slowly? Well, human reflexes take milliseconds, while CPUs and GPUs function on the order of nanoseconds. That is six orders of magnitude. And that is how slow we are.

This, we cannot afford to be. In the past century, data collection was the bottleneck. Datasets larger than a few kilobytes were unheard of. Now, we are playing in gigabyte territory. When I was consulting with a telecommunications company, a few months back, all calls through their network would generate upwards of 600 MB of data per day.

Volume is not the only dimension of this deluge of data. The rate of flow of incoming data gives us pause too. Think of the stock markets, imagine having to make decisions based on data, which within a few minutes (or even a few seconds), will become obsolete. Analytics is not a goal in itself. It is merely an aid to decision-making. Given the speed at which new data is collected, and the speed at which old data fades into obsolescence, we must be prepared to deal with incomplete, fast-flowing data.

Think of it as a stream from which you scoop a handful of water to determine the level of bacteria in the water. You only have limited information from a single sample, but, if you sample from multiple points upstream and downstream, you’ll finally get a fairly correct answer to your question.

Agile Analytics conjures up images of iterations, collaborating with customers, and fast feedback, when working on DW/BI projects. Indeed, this is what Ken Collier talks about in his book Agile Analytics. However, I wish to tackle a different angle. Hal Varian, Chief Economist at Google says believes that the dream job of this decade, is that of a statistician. Everyone has data. It’s harder to get opinions about the data. It’s harder to, as he says, “tell a story about this data”.
We’re at a moment in the software industry where lots of things have begun to intersect with our field of interest. Statistics is one of them. Assume you are a software engineer, and have more than a peripheral interest in this field. What do you do?

Learn classical statistics. Learn Bayesian statistics. You probably hated those textbooks, so don’t use them; there are tons of more useful educational resources on the Web. Get into machine learning. Understand that machine learning is not some super-exotic field of study. I’ll risk a limb and say that Machine Learning is just More Statistics under a trendy name.
Get a acquainted with a few languages and libraries. R, NumPy, Julia. In fact, I’m super-excited by Julia because of it offers native building blocks for distributed computation. Read a few papers on real-world distributed systems.

I do not talk about this because you’ll be building a distributed analytics engine from scratch (though you could). You will, through study of the subjects above, gain a much deeper understanding of why you should be analysing something, and also how such systems are built, You’re all, regardless of your previous background, engineers.

You will also encounter a lot of literature concerning visualisation while doing this. Visualisation is one of those things we don’t really pay much attention too, until we really need it. Bars, graphs, colours: anything in lieu of numbers, that can give us some visual indication of what’s going on. Health check pages, for example, are a useful way of integrating diagnostic information of a system.

ThoughtWorks Tech Radar : GPGPU

My take on GPGPUs from the ThoughtWorks Technology Radar, which I presented at the Sheraton Bangalore today, is based off the following document.

Seth Lloyd, a mechanical professor at MIT, once asked what the fastest computer in the universe would look like. Throwing aside concerns of fabrication, a circuit is only as fast as the speed at which you can flip a bit from 0 to 1, or vice versa. The faster the circuit, the more energy it consumes. Plugging in theoretical numbers, Lloyd came to the conclusion that a reasonably sized computer running at that speed would not look like one of the contraptions in front of you. In fact, it would become, to put not too fine a point, a black hole.

Well, we are somewhat far away from that realisation, but the fact is that most of us do not realise the potential that exists on each and every one of our laptops and desktops. Allow me an example. The ATI Radeon 5870 GPU, codenamed Osprey, packs enough processing units to support 30,000 threads. To make not too fine a point, I can take this room, and all of you in it, and replicate it 3000 times, and have each of you do a calculation, and this chunk of sand and solder would still be faster. And smaller.

We are at the point that vendors have begun to release GPUs which are specifically not designed for graphics processing. I hesitate to term such units as GPUs; take for example the NVidia Tesla. The Tesla is not even capable of outputting to video, by default. Yet, it powers the Tianhe-1, the second fastest supercomputer on the planet. Again, take the Titan supercomputer. Running on close to 20,000 Tesla GPUs, it is a public access supercomputer, meaning that should you feel the need to do some terascale research, you can log onto it, right now.

Today, there are multiple streams of vendor-specific GPU technology. They are all based on the venerable C99 standard, with language extensions. In case of NVidia, it is CUDA; in case of AMD, it is the Stream Processing SDK. However, the portable option which works across GPUs, as well as CPUs, and is gaining traction is OpenCL.

Computing on the GPU requires a programming model not unlike the well-known MapReduce model, that is Stream Processing. It requires you to create a computational kernel, in effect a function, which is then applied to blocks of data. There are other constraints on the kernel code that you can write. Essentially, to take advantage of stream processing, look for problems which involve high compute intensity, and near-total data parallelism.

Bioinformatics, Computational Finance, Medical Imaging, Molecular Dynamics, Weather and Climate Forecasting…anywhere you have a ton of data waiting to be crunched, GPU computing is a perfect fit. Even Hadoop has support for CUDA at this moment. GPUs are now ubiquitous, I’d probably risk calling them commodity hardware at this point. They sit in almost all of your machines, powering your displays, rendering your games. Never have developers been privy to so much power within so little space. And it’s not even a black hole yet. So, go forth and compute!