Tag Archives: probability

Decision Trees

Continuing on with my exploration of the data mining landscape, I extracted out a decision tree out of the data under scrutiny. It is the same data (20,000 samples, 56 samples), but the dimensions I’ve chosen for partitioning are a bit different from the raw attributes. I’ve conflated the 56 dimensions into a single number, since this is a test score we’re talking about, and I’m not sure modelling the indivdual responses for constructing the tree would be the best idea. I’m not really looking for close fits, buckets or bins should be adequate for discretising the response space.
Accordingly, I’ve partitioned the pre-scores as EXCELLENT, GOOD, AVERAGE, etc. The attribute that I’m attempting to predict is also a score, but this is the post-score. Well, not really the score itself, the improvement in score is a more sensible metric to attempt to guess.

I had a problem with trying to visualise the data, but I’ve been able to make do with indenting the different levels of decision nodes; this should be fine till I really need to use a visualisation library. I think the code could use a bit of work – probability notation does not lend itself easy to elegant variable naming. I’ll probably write a lot more on this topic once I’m into Bayesian Nets.

Tests increase our Knowledge of a System: A Probabilistic Proof

This was an old proof that was up on my old blog, but since I’m no longer posting to that, I’m reposting it here for posterity. Also, rewriting the equations in LaTeX, now that I have installed a plugin for that.

I present a simple mathematical device to prove that tests improve our understanding of code. It does not really matter if this is code written by the test author himself or is legacy. To do this, some simplification of the situation is necessary.
Continue reading Tests increase our Knowledge of a System: A Probabilistic Proof