# A Story about Data, Part 1: The shape of the data

Note about the visualisations: All of the plotting was done with Basis-Processing. You’ll find its source here.

The current dataset that I’m working comes from the education domain. Roughly, there are 29000 records, each record lists the following:

• Location of the student’s school
• Language of the student
• Student’s score before intervention
• Student’s score after intervention

# Interacting with Graphs : Mouse-over and lambda-queuer

In the previous post, I described how I’d put together a basic system to drive data selection/exploration through a queue. While generating more graphs, it became evident that the code for mouseover interaction followed a specific pattern. More importantly, using Basis to plot stuff, mandated that I look at the inverse problem; namely, determining the original point from the point under the mouse pointer. In this case, it was pretty simple, since I’m only dealing with 2D points. Here’s a video of how it looks like. The example shows the exploration of a covariance matrix.

# Driving data visualisation over a queue using RabbitMQ and lambda-queuer

One of the things which has bothered me ever since I took the dive into visualisation is the problem of interactivity. The aim of interacting with a visualisation is to drill down or explore areas of the visualisation which are (or seem) interesting. Put another way, we are essentially filtering the data from a visual standpoint. In most cases, mouse interactions may be sufficient. But what if I wanted to be able to filter the data programmatically and have the result reflected in the visualisation?

One way is to simply re-run the code which generates the visualisation each time we use a different filter. This is the simplest, and, in many cases, enough. In this case, the modification to the code is made in an offline fashion. What if we wanted to do the same, but while the program is running? This describes my attempt at one such implementation. Albeit still somewhat primitive, we’ll see where it ends up. For the purposes of demonstration, I used the Parallel Coordinates visualisation, which is available on GitHub. I’ll continue using Processing through Ruby-Processing for this description.
Continue reading Driving data visualisation over a queue using RabbitMQ and lambda-queuer

# A guide to using Basis (updated for v0.6.0+)

This is a quick tour of Basis. Find the source for Basis on GitHub. Installing Basis is pretty simple; just grab it as a gem for your JRuby installation. Brief notes on the installation can be found here.

UPDATE: Starting from version 0.6.0, Basis allows you to specify axis labels. Additionally, you can specify arrays of points instead of plotting points one at a time. When you do this, you can also specify a corresponding legend string, which will show up in a legend guide. See below for more details.

UPDATE: Starting from version 0.5.9, you can turn grid lines on or off. Additionally, the matrix operations implementation has been ported to use the Matrix class in Ruby’s stdlib.

UPDATE: Starting from version 0.5.8, you can customise axis labels, draw arbitrary shapes/text/plot custom graphics at any point in your coordinate system. See below for more details.

UPDATE: With version 0.5.7, experimental support has been added for drawing objects which aren’t points. Interactions with such objects is currently not supported. Additional support for drawing markers/highlighting in custom bases is now in.

UPDATE: Starting from version 0.5.1, Basis has been ported to Ruby 1.9.2, because of the kd-tree library dependency. Currently, there are no plans of maintaining Basis compatibility with Ruby 1.8.x. As an aside, I personally recommend using RVM to manage the mess of Ruby/JRuby installations that you’re likely to have on your machine.

UPDATE: Basis has hit version 0.5.0 with experimental support for mouseover interactivity. More work is incoming, but the demo code below is up-to-date, for now. The code below should be the same as demo.rb on GitHub.
Continue reading A guide to using Basis (updated for v0.6.0+)

# Data interactions in parallel coordinates: 40x-60x speedup

This is an update on the visualisation post on parallel coordinates. Understanding the Processing model made me realise that it probably wasn’t a good idea to draw all the samples each time draw() was called. Of course, every refreshed call of draw() does not clear away the previous frame’s graphics, so that just makes it easier. In the end, I went and explicitly drew only the samples which were under the current mouse position.

The speedup is obvious and massive: whereas the previous version worked well with only 300 samples, the current one processes 18000 samples without breaking a sweat. At 29,000 samples, there is a bit of a slowdown, but only just a bit, you wait 1 second for the highlighting instead of 6-7.
Here’s the new video, using 18k samples. Notice how much denser the mesh is.

# Data interactions in parallel coordinates

Processing is growing on me. Inspired by the different and (very) interesting data visualisation examples I’ve seen, I decided to take a shot at interacting with the parallel coordinates that I generated here. Of course, I had to reduce the number of samples for this demonstration; it’d slow to a unholy crawl otherwise. For this video, I’ve taken 300 samples. The interaction is essentially a mouse-hover highlighting of any sample(s) under it. I fiddled with the colors a bit, but decided that a white-on-greyscale scheme would show up better.
Of course, I still haven’t gotten around to labeling the axes. This I’ll probably pick up next. But as the video demonstrates, there’s a lot to Processing than meets the eye.

PS: By the way, the actual demonstration ends around the halfway mark; I was trying to figure out how to stop the bloody recorder.

# Getting ActiveRecord to behave nicely with Ruby-Processing in JRuby

Really, all I wanted to do was use Processing from Ruby. jashkenas has kindly written a gem which does just that. There was only a slight wrinkle: I wanted to pull my data from MySQL through ActiveRecord. Well, JRuby makes this process slightly more interesting than usual, so I document the process here. To start off with, install the gem with:

`sudo gem install ruby-processing`

Go into the directory where the ruby-processing gem is installed, and from there into {ruby-processing.gem.dir}/lib/core. In my case, this was /usr/lib/ruby/gems/1.8/gems/ruby-processing-1.0.9/lib/core.
Once inside there, you’ll find a file named jruby-complete.jar. Get rid of it, because we’ll be replacing it with a fresh (and different) version of jruby-complete.jar. Download the 1.6.3 version of JRuby-complete jar file. Rename it to jruby-complete.jar and put it in place of the jarfile we just deleted.

One step remains: this jarfile does not contain the activerecord-jdbcmysql-adapter gem. Install that with:

`java -jar jruby-complete.jar -S gem install  activerecord-jdbcmysql-adapter --user-install`

You’re good to go now. One more thing, just remember to replace the ActiveRecord adapter string with “jdbcmysql” and allow usage of that gem in your code with:

`require 'arjdbc'`

.

# Parallel Coordinate visualisation of 28k, 5-dimensional data

This is the visualisation of the same dataset that I’ve been working on for a while, for exploring different data mining and visualisation techniques. Currently, the axes aren’t labelled, and the color coding is for different categories. Looks like a really interesting way to explore the data.

# A detour through data visualisation

I should have seen it coming. Text communicates well – up to a point. All the current analyses I’ve been working on, starting from Self-Organising Maps to Decision Trees, are very well served by good, solid visualisation. My current need is a way to visualise data structures effectively, even if it is merely a bunch of nodes which can be expanded/collapsed to show more information. Additionally, it would be nice (not necessary) for this to happen interactively, but I don’t mind a command line-driven approach. In fact, I prefer the command line; makes it easier to drive it through a scripting language like Ruby.
So far, I’ve looked at Processing, d3.js and ProtoVis. I like the idea of d3.js: the data-driven approach makes a lot of sense, but I think I need to refresh quite a bit of CSS-fu to take advantage of its capabilities. Apart from visualisation derived from data mining algorithms, showing the data as-is in an aesthetic manner is also a worthy goal at this point. In particular, the parallel coordinates visualisation caught my eye.
Oh well, at least I know what I’ll be doing for the next few days.