Tag Archives: code

A Story about Data, Part 2: Abandoning the notion of normality

Continuing on with my work, I was just about to conclude the non-normal data of the distribution. However, I remembered reading about different transformations that can be applied to data to make it more normal. Are any such transformations likely to have any effect on the normality (or the lack thereof) of the score data?
I’d read about the Box-Cox family of transformations: essentially proceeding through powers and their inverses, in the quest to improve normality. I decided to try it, using the Jarque-Bera statistic as a measure of the normality of the data.
Continue reading A Story about Data, Part 2: Abandoning the notion of normality

A Story about Data, Part 1: The shape of the data

Note about the visualisations: All of the plotting was done with Basis-Processing. You’ll find its source here.

The current dataset that I’m working comes from the education domain. Roughly, there are 29000 records, each record lists the following:

  • Location of the student’s school
  • Language of the student
  • Student’s score before intervention
  • Student’s score after intervention

Continue reading A Story about Data, Part 1: The shape of the data

Interacting with Graphs : Mouse-over and lambda-queuer

In the previous post, I described how I’d put together a basic system to drive data selection/exploration through a queue. While generating more graphs, it became evident that the code for mouseover interaction followed a specific pattern. More importantly, using Basis to plot stuff, mandated that I look at the inverse problem; namely, determining the original point from the point under the mouse pointer. In this case, it was pretty simple, since I’m only dealing with 2D points. Here’s a video of how it looks like. The example shows the exploration of a covariance matrix.

Continue reading Interacting with Graphs : Mouse-over and lambda-queuer

Driving data visualisation over a queue using RabbitMQ and lambda-queuer

One of the things which has bothered me ever since I took the dive into visualisation is the problem of interactivity. The aim of interacting with a visualisation is to drill down or explore areas of the visualisation which are (or seem) interesting. Put another way, we are essentially filtering the data from a visual standpoint. In most cases, mouse interactions may be sufficient. But what if I wanted to be able to filter the data programmatically and have the result reflected in the visualisation?

One way is to simply re-run the code which generates the visualisation each time we use a different filter. This is the simplest, and, in many cases, enough. In this case, the modification to the code is made in an offline fashion. What if we wanted to do the same, but while the program is running? This describes my attempt at one such implementation. Albeit still somewhat primitive, we’ll see where it ends up. For the purposes of demonstration, I used the Parallel Coordinates visualisation, which is available on GitHub. I’ll continue using Processing through Ruby-Processing for this description.
Continue reading Driving data visualisation over a queue using RabbitMQ and lambda-queuer

Installing the Basis gem (updated for v0.5.1+)

You can use Ruby-Processing in two ways.

    Use the jruby-complete.jar that Ruby-Processing ships, the Gems-in-a-Jar approach. In this mode, all gems you install will be packaged as part of the JAR.
    Using the JRuby already installed on your system.

If you’re following the first approach, first head to the location where the jruby-complete.jar is located, for Ruby-Processing. There, do this:

java -jar jruby-complete.jar -S gem install basis-processing --user-install

Alternatively, if you’re using a conventional JRuby installation, do this:

sudo jruby -S gem install basis-processing

A guide to using Basis (updated for v0.6.0+)

This is a quick tour of Basis. Find the source for Basis on GitHub. Installing Basis is pretty simple; just grab it as a gem for your JRuby installation. Brief notes on the installation can be found here.

UPDATE: Starting from version 0.6.0, Basis allows you to specify axis labels. Additionally, you can specify arrays of points instead of plotting points one at a time. When you do this, you can also specify a corresponding legend string, which will show up in a legend guide. See below for more details.

UPDATE: Starting from version 0.5.9, you can turn grid lines on or off. Additionally, the matrix operations implementation has been ported to use the Matrix class in Ruby’s stdlib.

UPDATE: Starting from version 0.5.8, you can customise axis labels, draw arbitrary shapes/text/plot custom graphics at any point in your coordinate system. See below for more details.

UPDATE: With version 0.5.7, experimental support has been added for drawing objects which aren’t points. Interactions with such objects is currently not supported. Additional support for drawing markers/highlighting in custom bases is now in.

UPDATE: Starting from version 0.5.1, Basis has been ported to Ruby 1.9.2, because of the kd-tree library dependency. Currently, there are no plans of maintaining Basis compatibility with Ruby 1.8.x. As an aside, I personally recommend using RVM to manage the mess of Ruby/JRuby installations that you’re likely to have on your machine.

UPDATE: Basis has hit version 0.5.0 with experimental support for mouseover interactivity. More work is incoming, but the demo code below is up-to-date, for now. The code below should be the same as demo.rb on GitHub.
Continue reading A guide to using Basis (updated for v0.6.0+)

Basis: Plotting arbitrary coordinate systems in Ruby-Processing

One of the first things I realised while working on visualisations in Processing is that a lot of the work required in setting up coordinate systems and plotting them is somewhat of a chore. Specifically, for things like parallel coordinates, multiple axes, each with its own scaling, I initially ended up with some pretty ugly custom code for each case. I did look around in the Libraries section of the Processing website, but didn’t find anything specific to manipulating and plotting coordinate axes.
Continue reading Basis: Plotting arbitrary coordinate systems in Ruby-Processing

Data interactions in parallel coordinates: 40x-60x speedup

This is an update on the visualisation post on parallel coordinates. Understanding the Processing model made me realise that it probably wasn’t a good idea to draw all the samples each time draw() was called. Of course, every refreshed call of draw() does not clear away the previous frame’s graphics, so that just makes it easier. In the end, I went and explicitly drew only the samples which were under the current mouse position.

The speedup is obvious and massive: whereas the previous version worked well with only 300 samples, the current one processes 18000 samples without breaking a sweat. At 29,000 samples, there is a bit of a slowdown, but only just a bit, you wait 1 second for the highlighting instead of 6-7.
Here’s the new video, using 18k samples. Notice how much denser the mesh is.

Data interactions in parallel coordinates

Processing is growing on me. Inspired by the different and (very) interesting data visualisation examples I’ve seen, I decided to take a shot at interacting with the parallel coordinates that I generated here. Of course, I had to reduce the number of samples for this demonstration; it’d slow to a unholy crawl otherwise. For this video, I’ve taken 300 samples. The interaction is essentially a mouse-hover highlighting of any sample(s) under it. I fiddled with the colors a bit, but decided that a white-on-greyscale scheme would show up better.
Of course, I still haven’t gotten around to labeling the axes. This I’ll probably pick up next. But as the video demonstrates, there’s a lot to Processing than meets the eye.

PS: By the way, the actual demonstration ends around the halfway mark; I was trying to figure out how to stop the bloody recorder.