Tag Archives: GPGPU

ThoughtWorks Tech Radar : GPGPU

My take on GPGPUs from the ThoughtWorks Technology Radar, which I presented at the Sheraton Bangalore today, is based off the following document.

Seth Lloyd, a mechanical professor at MIT, once asked what the fastest computer in the universe would look like. Throwing aside concerns of fabrication, a circuit is only as fast as the speed at which you can flip a bit from 0 to 1, or vice versa. The faster the circuit, the more energy it consumes. Plugging in theoretical numbers, Lloyd came to the conclusion that a reasonably sized computer running at that speed would not look like one of the contraptions in front of you. In fact, it would become, to put not too fine a point, a black hole.

Well, we are somewhat far away from that realisation, but the fact is that most of us do not realise the potential that exists on each and every one of our laptops and desktops. Allow me an example. The ATI Radeon 5870 GPU, codenamed Osprey, packs enough processing units to support 30,000 threads. To make not too fine a point, I can take this room, and all of you in it, and replicate it 3000 times, and have each of you do a calculation, and this chunk of sand and solder would still be faster. And smaller.

We are at the point that vendors have begun to release GPUs which are specifically not designed for graphics processing. I hesitate to term such units as GPUs; take for example the NVidia Tesla. The Tesla is not even capable of outputting to video, by default. Yet, it powers the Tianhe-1, the second fastest supercomputer on the planet. Again, take the Titan supercomputer. Running on close to 20,000 Tesla GPUs, it is a public access supercomputer, meaning that should you feel the need to do some terascale research, you can log onto it, right now.

Today, there are multiple streams of vendor-specific GPU technology. They are all based on the venerable C99 standard, with language extensions. In case of NVidia, it is CUDA; in case of AMD, it is the Stream Processing SDK. However, the portable option which works across GPUs, as well as CPUs, and is gaining traction is OpenCL.

Computing on the GPU requires a programming model not unlike the well-known MapReduce model, that is Stream Processing. It requires you to create a computational kernel, in effect a function, which is then applied to blocks of data. There are other constraints on the kernel code that you can write. Essentially, to take advantage of stream processing, look for problems which involve high compute intensity, and near-total data parallelism.

Bioinformatics, Computational Finance, Medical Imaging, Molecular Dynamics, Weather and Climate Forecasting…anywhere you have a ton of data waiting to be crunched, GPU computing is a perfect fit. Even Hadoop has support for CUDA at this moment. GPUs are now ubiquitous, I’d probably risk calling them commodity hardware at this point. They sit in almost all of your machines, powering your displays, rendering your games. Never have developers been privy to so much power within so little space. And it’s not even a black hole yet. So, go forth and compute!