Unit Testing AngularJS Controllers, Views, and More: Part One

AngularJS makes a lot of things easy, but one thing you still have to come to grips to, is deciding:

  • What to unit test.
  • How to unit test it.
  • How to unit test it easily.

I’m not saying it’s hard, but it can sometimes be convoluted. Let’s answer the first question.

What to unit test

Normally, we want to be unit testing logic. This would usually translate to unit testing controllers. However, because AngularJS allows us to embed expressions inside our templates, it’d be nice to be able to unit test template logic too. To this, people might object because, of course, Selenium (for example) would be the choice to perform end-to-end testing. Which I agree to completely, except that end-to-end testing is not synonymous with controller-view integration testing.

In a past life, when I was working with Silverlight, using MVVM (Model, View, View-Model), my prevailing philosophy was to move most (if not all) the view related logic to the view model, so that we could unit test the view model thoroughly. The only remaining bugs would be a faulty binding. As it turns out, using this philosophy is quite useful in AngularJS as well. However, I do want to go a step further to be able to unit test these bindings as well. I also would want to be able to test the state of HTML elements. Normally, a higher-level acceptance test would do this, but the higher up the testing abstraction one moves to, the harder it is to pinpoint low-level issues when they arise.

Let’s go one step further. I’d also like to be able to unit-test interactions. This basically means verifying what would happen in a controller/view if the user interacted with a certain UI element.

So, in short, what I want to be able to test are:

  • Controllers
  • Template-Controller binding
  • View interactions

In this series, I shall start with a simple project structure, and use Duck-Angular, a framework I’ve developed to show how you can do all of the above. I’ll walk you through the code of Duck; hopefully, this will serve to illustrate the design thinking behind AngularJS too.

What is Duck-Angular?

Duck-Angular is a container for bootstrapping and testing AngularJS views and controllers in memory. No browser or external process is needed. Duck-angular is available as a Bower package. Install it using bower ‘install duck-angular’.

Include it using RequireJS’ define(). Your controller/service/object initialisation scripts need to have run before you use Duck-Angular. Put them in script tags, or load them using a script loader like RequireJS or Inject.

The Github repository is here. As we go on, we’ll see how Duck-Angular works, and how you can use it in your unit tests.

How to unit test it

Say whatever you will about AngularJS, it is very hackable. Some of that can be attributable to the Javascript language itself, but there are some very good design decisions that the development team has taken. The two points which come to mind are:

  • Copious event publishing
  • Unintrusive dependency injection model

As we shall see, each of these points make it possible to write unit tests for almost every part of your application code.

Which Libraries?

Before beginning, I shall list out the libraries I’ve used for this example project:

  • Duck-Angular (A container for bootstrapping and testing AngularJS views and controllers in memory: no browser or external process needed)
  • RequireJS (Isolates libraries, including AngularJS for your specific context)
  • Q (Used for bootstrapping the Angular application, and mocking $q)
  • Mocha and Chai (Unit testing framework, and assertion framework, respectively)
  • Sinon (Mocking/Stubbing framework)
  • Mocha-as-Promised and Chai-as-Promised (Extensions to Mocha and Chai to ease handling of promises)

The example project to follow along is AngularJS-RequireJS-Seed Project. It is in a bit of a flux at the moment, but we’ll fix that soon.

Project Structure

With the imminent advent of the ES6 Module System, it only makes sense to structure Javascript code as proper, first-class application code, instead of a bunch of script tags. To this end, I’ve used RequireJS to structure the project. There are several other beneficial effects of using RequireJS apart from this isolation, which we’ll soon see. The folder structure of the app is as below:

js
 └───public
     ├───js
     │   ├───app
     │   │   ├───controller
     │   │   ├───directive
     │   │   ├───factory
     │   │   └───services
     │   ├───lib
     │   └───test
     │       ├───lib
     │       └───unit
     │           ├───controllers
     │           ├───services
     │           └───ui
     └───templates

Some quick notes about the folder structure.

  • The test/lib directory contains test-specific libraries like Mocha, Chai, etc.
  • This folder structure is not an absolute must; it just makes it easier to explain the application organisation.

Apache Server Configuration

The choice of server to serve these static assets is up to you, but just in case you’re using Apache, here’s the snippet from httpd.conf that you might find useful. Of course, you’ll have to modify the local directory path to your JS project.

    ProxyPreserveHost On
    ProxyRequests Off
    ProxyPass /example/static !
    Alias /example/static/ "C:/projects/AngularJS-RequireJS-Seed/example-app/src/main/resources/public/"

    Header set Cache-control "no-cache,must-revalidate"
    Header set pragma "no-cache"
    Header set Expires "Sat, 1 Jan 2000 00:00:00 GMT"

    <Directory "C:/projects/AngularJS-RequireJS-Seed/example-app/src/main/resources/public">
       Options Indexes FollowSymLinks MultiViews ExecCGI
       AllowOverride All
       Order allow,deny
       Allow from all
    </Directory>

Bootstrapping the application

The relevant snippet which starts off the app is in index.html:

<head>
  <title>RequireJS+Angular</title>
    <script src="/example/static/js/lib/require.js" data-main="/example/static/js/app/bootstrap.js"></script>
</head>

Looking at the code above, we see that the first Javascript file to be loaded and run, is bootstrap.js. The bootstrap.js code is very short, and is shown below.

    require(["app.config"], function(config) {
    require(["app/app"], function(app) {
    app.bootstrap(app.init());
    });
});

This code loads up two files, app.config.js, and app.js in sequence (They should not be loaded parallely, hence the nested require’s). Once the app module has been loaded, it is initialised (init()), and bootstrapped (bootstrap()). The two phases are separate so as to allow the developer to add extra test-specific initialisation after init(), but before bootstrap().

Now, the question is: what do bootstrap() and init() do? Here is the init() function from the app.js module:

var init = function() {
  var app = angular.module('ExampleApp', []);
  services.init(app);
  libs.init(app);
  directives.init(app);
  controllers.init(app);
  factories.init(app);
  app.config(['$routeProvider', function ($routeProvider) {
    $routeProvider.
      when('/navigation', {templateUrl: '/example/static/templates/navigation.html', controller: 'navigationController'}).
      when('/route1', {templateUrl: '/example/static/templates/route1.html', controller: 'route1Controller'}).
      when('/route2', {templateUrl: '/example/static/templates/route2.html', controller: 'route2Controller'}).
      otherwise({redirectTo: '/navigation'});
  }]);
  return app;
};

Parts of this may be familiar to you, for example, the wiring up of the routes. This is where we manually bootstrap the application, calling it “ExampleApp”. To set up controllers, services, factories, etc., I follow a process which might seem a little unfamiliar. Not to worry, it’s pretty simple. The core idea here, which will resurface when we actually get down to testing the entities, is that every Angular controller, factory, service, etc. starts off as a RequireJS module.

Make every Angular entity start off as a RequireJS module. Why?

Why is this important? Think of any AngularJS controller. You declare it like so:

    angular.controller("SomeController", function(...) {
       // Controller code
}

Now, let’s pause for a second, and think of how we can get a hold of the instance of that controller if we ever wanted to unit test it? Well, it is an AngularJS controller, so you’d need to be able to ask AngularJS to instantiate it, using AngularJS’ dependency injection framework (in this case, we’d use $injector.get(), or the $controller service).

This implies that you will need to at least initialise your AngularJS app before you can use a controller object. Kind of sucks, huh? But, if the controller’s constructor function started out as a RequireJS module, you’ll not need to bootstrap your app to create a controller. All you really do is call a constructor function, and you’re done. Not only that, you have full control over what dependencies you inject inside your controller, because you’re invoking the constructor function yourself. Later on, we’ll see how to specify only a subset of a controller’s dependencies, while the rest are populated by their default (production) dependencies.

So, to register controllers with AngularJS, controllers.js does this:

  define(["route1-controller",
  "route2-controller",
  "navigation-controller"
], function (route1Ctrl, route2Ctrl, navigationCtrl) {
    var init = function (app) {
      app.controller('route1Controller', ['$scope', 'service1', '$q', route1Ctrl]);
      app.controller('route2Controller', ['$scope', '$location', '$q', 'service2', route2Ctrl]);
      app.controller('navigationController', ['$scope', '$location', navigationCtrl]);
  };
  return {init: init};
});

All it does is return an object with an init() function. This function get the controller constructor functions (route1Ctrl, route2Ctrl, navigationCtrl), and register them with AngularJS like normal. The init() function is triggered in app.js, like so:

    controllers.init(app);

This is exactly identical to how the services, factories, and directives are registered with AngularJS. What does the bootstrap() method in app.js do? Let’s see.

var bootstrap = function(app) {
  var deferred = Q.defer();
  var injector = angular.bootstrap($('#ExampleApp'), ['ExampleApp']);
  deferred.resolve([injector, app]);

  return deferred.promise;

}

It simply takes the Angular app object (which has hopefully been initialised inside the init() method), and actually bootstraps it, binding it to a DOM element in the HTML. It then returns a Q promise, which contains an object with two fields.

  • The injector, which is responsible for Angular’s dependency injection. This is not so useful in the actual production code, but is handy in unit tests, when you need to get a handle to something registered with AngularJS.
  • The app, which is the app object itself. You could use it in your unit tests to perform further configuration changes.

The above two methods, used in conjunction inside bootstrap.js, starts up our app.

In the next post, I shall discuss how the test environment is set up.

RequireJS-AngularJS solutions for (almost) every constraint

Update: I’ve added a seed project which uses AngularJS and RequireJS, for option #3 below. It’s located here.

So, you like RequireJS. No, you probably adore it. Enough to want to evangelise it’s benefits to anyone who will listen. And now you have AngularJS, which is pretty neat. Or so you think. At least, you probably are using it on your project.

There’s one problem. Your team has not bought into the benefits of module loaders. They think nothing of putting a zillion script tags in your HTML. It doesn’t matter whether the opposing faction is one tech lead, or a few peer developers. Or maybe they are fine with the idea of module loading, but balk at the idea of wrapping all their Javascript code in “ugly” define()’s.

I’ll describe ways to inject RequireJS into your AngularJS projects with minimal disruption, depending upon the set of constraints you find yourself saddled with. I usually say that there’s no “best” way to use RequireJS in your project. However, there are certainly preferred ways to use it, and I’ll note them as I go along.

This post is still a work in progress; expect updates.

1) Use RequireJS piecemeal, wherever you’re writing new code.

  • You’ll use both Angular’s Dependency Injection and RequireJS.
  • This does not get rid of top-level script tags because you’re not in control of the project structure/Angular bootstrapping.
  • You’ll have RequireJS configurations inside your controllers, which might look a bit ugly.
  • You will have to set up Angular controllers/factories/services, in order to unit test them. Since the smaller building blocks can simply come from RequireJS modules, some unit tests might not need this initialisation.

2) Use RequireJS during bootstrap, using Angular’s Dependency Injection for all modularisation.

  • You won’t utilise RequireJS module dependency resolution features, with this method.
  • You will get rid of script tags.
  • You will have to necessarily bootstrap your Angular app manually, with this approach.
  • Angular controller/service/object declarations will not need to be defined inside RequireJS modules.
  • You will have to set up Angular controllers/factories/services, in order to unit test them. Since all building blocks will be Angular components, you’ll have to do this for every unit test.

3) Use RequireJS during bootstrap, using Angular’s Dependency Injection for service/factory/controller/coarse-grained objects only. All service/factory/controller declarations come from RequireJS modules.

Note: I prefer this approach.

  • This allows you to unit test controller/service logic without having to initialise Angular modules in unit tests.
  • You will get rid of script tags.
  • You will have to necessarily bootstrap your Angular app manually, with this approach.
  • You’ll invoke RequireJS inside controllers/services, etc., as needed.

ThoughtWorks Tech Radar : Mechanical Sympathy

My take on Mechanical Sympathy (from the ThoughtWorks Technology Radar), which I presented at the Sheraton Bangalore, is based off the content below.

“The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry.” – Henry Petroski

“Premature optimisation is the root of all evil.” – Donald Knuth

The Hibernian Express is the first transatlantic fiber-optic communications cable to be laid in 10 years, at a cost of $300 million. The current speed record for establishing transatlantic communication is 65 milliseconds. The Hibernian Express will reduce that. By all of 6 milliseconds. If that isn’t a very expensive optimisation, I do not know what is.

In almost all cases, the code that we write is abstracted away from the internals of the hardware. This is a desirable and necessary thing. However, particular domains require applications to operate under a set of exacting constraints. Recent interest in Ultra-Low Latency Trading in the HFT arena typically requires order volumes of over 5000 orders a second with order and execution report round trip times of 100 microseconds. In such cases, tailoring your architecture to handle concurrency is no longer an idle option, it is a necessity. Even for more prosaic applications, it is not uncommon to need low latency data structures.

Usually, requiring low latency boils down to minimising time spent in concurrency management with respect to actual logic processing. Today’s programming languages provide a variety of constructs to model concurrent operations. Locks, mutexes, memory barriers, to name a few. Even at the opcode level, you may use CAS operations, which are cheaper than locks. However, to move to the upper end of the curve, to get to really low latency, many designers eschew all of these constructs.

One good example is the Disruptor, which is a high performance concurrency framework for Java. In a series of excellent articles, Martin Thompson, one of the authors of the Disruptor framework, discusses techniques to reduce latency by write combining, writing lock free algorithms, and the Single Writer principle.

Even if lock contention is an issue, there are other ways of reducing latency. One example is when a team working to increase the performance of their custom JMS implementation, which wrote their custom implementation of the JDK Executor interface – the Executor interface is responsible for firing off Runnable jobs, by the way. This resulted in an improvement by a factor of 10.

One of the more explicit forms of mechanical sympathy is when you rewrite software to execute on specially designed hardware. GPUs and FPGAs are commonplace in financial computing.

Indirectly, this form of thinking also seems to have influenced the design of single-threaded servers with asynchronous I/O. In a multi-threaded server, you, or rather the server, are faced constantly with having to switch contexts between threads. With a single thread model, latency is greatly reduced.

ThoughtWorks Tech Radar : Agile Analytics

My take on Agile Analytics from the ThoughtWorks Technology Radar, which I presented at the Sheraton Bangalore today, is based off the following document.

Patient: Will I survive this risky operation?
Surgeon: Yes, I’m absolutely sure that you will survive the operation.
Patient: How can you be so sure?
Surgeon: Well, 9 out of 10 patients die in this operation, and yesterday my ninth patient died.

Andrew Lang, a Scottish writer and collector of folk tales, once remarked that many people use statistics as a drunken man uses lamp-posts…for support rather than illumination. Even so, we have come a long way from the 9th century, when Al Kindi used statistics to decipher encrypted messages and developed the first code breaking algorithm, in Baghdad – incidentally, he was instrumental in introducing the base 10 Indian numeral system to the Islamic and the Christian world.

1654 – Pascal and Fermat create the mathematical theory of probability,
1761 – Thomas Bayes proves Bayes’ theorem,
1948 – Shannon’s Mathematical Theory of Communication defines capacity of communication channels in terms of probabilities. Bit of a game changer, that one. All our designs of communication networks and error-correction algorithms stem from insights found in that work.

Today, we realise that the pace at which we collect data far exceed our capability to make sense of it. Data is everywhere, *literally*. The blood cells in your body trying to determine whether that molecule is an oxygen molecule or not? That is data. Your build breaking? That is data. You’re running a static analysis tool to check your test coverage? Yeah, that is data analysis.
Unfortunately, we are at that point where our opinions about whether a piece of data is relevant to analysis, form far too slowly. How slowly? Well, human reflexes take milliseconds, while CPUs and GPUs function on the order of nanoseconds. That is six orders of magnitude. And that is how slow we are.

This, we cannot afford to be. In the past century, data collection was the bottleneck. Datasets larger than a few kilobytes were unheard of. Now, we are playing in gigabyte territory. When I was consulting with a telecommunications company, a few months back, all calls through their network would generate upwards of 600 MB of data per day.

Volume is not the only dimension of this deluge of data. The rate of flow of incoming data gives us pause too. Think of the stock markets, imagine having to make decisions based on data, which within a few minutes (or even a few seconds), will become obsolete. Analytics is not a goal in itself. It is merely an aid to decision-making. Given the speed at which new data is collected, and the speed at which old data fades into obsolescence, we must be prepared to deal with incomplete, fast-flowing data.

Think of it as a stream from which you scoop a handful of water to determine the level of bacteria in the water. You only have limited information from a single sample, but, if you sample from multiple points upstream and downstream, you’ll finally get a fairly correct answer to your question.

Agile Analytics conjures up images of iterations, collaborating with customers, and fast feedback, when working on DW/BI projects. Indeed, this is what Ken Collier talks about in his book Agile Analytics. However, I wish to tackle a different angle. Hal Varian, Chief Economist at Google says believes that the dream job of this decade, is that of a statistician. Everyone has data. It’s harder to get opinions about the data. It’s harder to, as he says, “tell a story about this data”.
We’re at a moment in the software industry where lots of things have begun to intersect with our field of interest. Statistics is one of them. Assume you are a software engineer, and have more than a peripheral interest in this field. What do you do?

Learn classical statistics. Learn Bayesian statistics. You probably hated those textbooks, so don’t use them; there are tons of more useful educational resources on the Web. Get into machine learning. Understand that machine learning is not some super-exotic field of study. I’ll risk a limb and say that Machine Learning is just More Statistics under a trendy name.
Get a acquainted with a few languages and libraries. R, NumPy, Julia. In fact, I’m super-excited by Julia because of it offers native building blocks for distributed computation. Read a few papers on real-world distributed systems.

I do not talk about this because you’ll be building a distributed analytics engine from scratch (though you could). You will, through study of the subjects above, gain a much deeper understanding of why you should be analysing something, and also how such systems are built, You’re all, regardless of your previous background, engineers.

You will also encounter a lot of literature concerning visualisation while doing this. Visualisation is one of those things we don’t really pay much attention too, until we really need it. Bars, graphs, colours: anything in lieu of numbers, that can give us some visual indication of what’s going on. Health check pages, for example, are a useful way of integrating diagnostic information of a system.

ThoughtWorks Tech Radar : GPGPU

My take on GPGPUs from the ThoughtWorks Technology Radar, which I presented at the Sheraton Bangalore today, is based off the following document.

Seth Lloyd, a mechanical professor at MIT, once asked what the fastest computer in the universe would look like. Throwing aside concerns of fabrication, a circuit is only as fast as the speed at which you can flip a bit from 0 to 1, or vice versa. The faster the circuit, the more energy it consumes. Plugging in theoretical numbers, Lloyd came to the conclusion that a reasonably sized computer running at that speed would not look like one of the contraptions in front of you. In fact, it would become, to put not too fine a point, a black hole.

Well, we are somewhat far away from that realisation, but the fact is that most of us do not realise the potential that exists on each and every one of our laptops and desktops. Allow me an example. The ATI Radeon 5870 GPU, codenamed Osprey, packs enough processing units to support 30,000 threads. To make not too fine a point, I can take this room, and all of you in it, and replicate it 3000 times, and have each of you do a calculation, and this chunk of sand and solder would still be faster. And smaller.

We are at the point that vendors have begun to release GPUs which are specifically not designed for graphics processing. I hesitate to term such units as GPUs; take for example the NVidia Tesla. The Tesla is not even capable of outputting to video, by default. Yet, it powers the Tianhe-1, the second fastest supercomputer on the planet. Again, take the Titan supercomputer. Running on close to 20,000 Tesla GPUs, it is a public access supercomputer, meaning that should you feel the need to do some terascale research, you can log onto it, right now.

Today, there are multiple streams of vendor-specific GPU technology. They are all based on the venerable C99 standard, with language extensions. In case of NVidia, it is CUDA; in case of AMD, it is the Stream Processing SDK. However, the portable option which works across GPUs, as well as CPUs, and is gaining traction is OpenCL.

Computing on the GPU requires a programming model not unlike the well-known MapReduce model, that is Stream Processing. It requires you to create a computational kernel, in effect a function, which is then applied to blocks of data. There are other constraints on the kernel code that you can write. Essentially, to take advantage of stream processing, look for problems which involve high compute intensity, and near-total data parallelism.

Bioinformatics, Computational Finance, Medical Imaging, Molecular Dynamics, Weather and Climate Forecasting…anywhere you have a ton of data waiting to be crunched, GPU computing is a perfect fit. Even Hadoop has support for CUDA at this moment. GPUs are now ubiquitous, I’d probably risk calling them commodity hardware at this point. They sit in almost all of your machines, powering your displays, rendering your games. Never have developers been privy to so much power within so little space. And it’s not even a black hole yet. So, go forth and compute!

Two-phase commit : Indistinguishable state failure scenario

I’ll review the most interesting failure scenario for the 2PC protocol. There are excellent explanations of 2PC out there, and I won’t bother too much with the basic explanation. The focus of this post is a walkthrough of the indistinguishable state scenario, where neither a global commit, nor a global abort command can be issued. Continue reading Two-phase commit : Indistinguishable state failure scenario

Parallelisation : Writing a linear matrix algorithm for Map-Reduce

There are multiple ways to skin matrix multiplication. If you begin to think about it, there are probably 4 or 5 ways in which you could approach matrix multiplication. In this post, we look at another, easier, way of multiplying two matrices, and attempt to build a MapReduce version of the algorithm. Before we dive into the code itself, we’ll quickly review the actual algebraic process we’re trying to parallelise. Continue reading Parallelisation : Writing a linear matrix algorithm for Map-Reduce

Parallelisation : Refactoring a recursive block matrix algorithm for Map-Reduce

I’ve recently gotten interested in the parallelisation of algorithms in general; specifically, the type of algorithm design compatible with the MapReduce model of programming. Given that I’ll probably be dealing with bigger quantities of data in the near future, it behooves me to start think about parallelisation, actively. In this post, I will look at the matrix multiplication algorithm which uses block decomposition, to recursively compute the product of two matrices. I have spoken of the general idea here; you may want to read that first for the linear algebra groundwork, before continuing on with this post. Continue reading Parallelisation : Refactoring a recursive block matrix algorithm for Map-Reduce

A Story about Data, Part 2: Abandoning the notion of normality

Continuing on with my work, I was just about to conclude the non-normal data of the distribution. However, I remembered reading about different transformations that can be applied to data to make it more normal. Are any such transformations likely to have any effect on the normality (or the lack thereof) of the score data?
I’d read about the Box-Cox family of transformations: essentially proceeding through powers and their inverses, in the quest to improve normality. I decided to try it, using the Jarque-Bera statistic as a measure of the normality of the data.
Continue reading A Story about Data, Part 2: Abandoning the notion of normality

Programming / Drawing / Robots / Machine Learning