Tag Archives: data mining

A Story about Data, Part 1: The shape of the data

Note about the visualisations: All of the plotting was done with Basis-Processing. You’ll find its source here.

The current dataset that I’m working comes from the education domain. Roughly, there are 29000 records, each record lists the following:

  • Location of the student’s school
  • Language of the student
  • Student’s score before intervention
  • Student’s score after intervention

Continue reading A Story about Data, Part 1: The shape of the data

Interacting with Graphs : Mouse-over and lambda-queuer

In the previous post, I described how I’d put together a basic system to drive data selection/exploration through a queue. While generating more graphs, it became evident that the code for mouseover interaction followed a specific pattern. More importantly, using Basis to plot stuff, mandated that I look at the inverse problem; namely, determining the original point from the point under the mouse pointer. In this case, it was pretty simple, since I’m only dealing with 2D points. Here’s a video of how it looks like. The example shows the exploration of a covariance matrix.

Continue reading Interacting with Graphs : Mouse-over and lambda-queuer

A detour through data visualisation

I should have seen it coming. Text communicates well – up to a point. All the current analyses I’ve been working on, starting from Self-Organising Maps to Decision Trees, are very well served by good, solid visualisation. My current need is a way to visualise data structures effectively, even if it is merely a bunch of nodes which can be expanded/collapsed to show more information. Additionally, it would be nice (not necessary) for this to happen interactively, but I don’t mind a command line-driven approach. In fact, I prefer the command line; makes it easier to drive it through a scripting language like Ruby.
So far, I’ve looked at Processing, d3.js and ProtoVis. I like the idea of d3.js: the data-driven approach makes a lot of sense, but I think I need to refresh quite a bit of CSS-fu to take advantage of its capabilities. Apart from visualisation derived from data mining algorithms, showing the data as-is in an aesthetic manner is also a worthy goal at this point. In particular, the parallel coordinates visualisation caught my eye.
Oh well, at least I know what I’ll be doing for the next few days.