I’ve been following the Linear Algebra series of lectures from MIT’s OpenCourseWare site. While watching Lecture 3 (I’m at Lecture 6 now), Professor Strang enumerates 5 methods of matrix multiplication. Two of those provided insights I wish my school teachers had provided me, but it was the fifth method which got me thinking.
The method is really a meta-method, and is a way of breaking down multiplication of large matrices in a recursive fashion. To demonstrate the idea, here’s a simple multiplication between two 2×2 matrices.
Simple, right? Now, imagine two very large matrices (for clarity, I shall stick to square matrices for now). Here’s a very simple fact that Professor Strang states which we can exploit, namely: the rules of matrix multiplication hold even if the elements of a matrix are themselves matrices.
Let that sink in. We’re saying that if we break up a massive matrix into 4 quadrants, and treat the collection of rows and columns in each quadrant as an independent matrix, the final result can be obtained by following the same rules of multiplication that we followed. It is as if each quadrant is now an element in its own right.
These are called Block Matrices.
Pictorially, this is what it means:
The A11, A12, A21, A22 and B11, B12, B21, B22 in the above figure are actually matrices obtained by partitioning the huge matrices into 4
quadrants block matrices.
As a result, if you do have 2 large matrices, you can partition them into smaller matrices recursively, until you get to reasonably-sized matrices which you can actually multiply quickly.
The calculations of each
partition block matrix can be parallelised, the ‘reduce’ stages simply wait for all the ‘map’ jobs to return. This is a hierarchical map-reduce, with each breakup potentially spawning multiple, parallel, subjobs.
In theory, there’s nothing stopping us from partitioning into 5, 6, or any other number, but you’ll see how 4 ties into where I’m headed with this.
Alright, now that we have that fundamental insight into matrix multiplication, lets shift gears a bit and look at what Quadtree Segmentation.
Here’s a simple transformation that we’ll run on this image.
1) If the variation in a block is below X, set the color of the block to the average color of all the pixels in the block. Exit current context.
2) If the variation in a block is above X, partition the image into 4 quadrants.
3) Go to step 1 for each quadrant.
The variation that I spoke of can be any sort of measure. In the image above, it can simply be the statistical variance of hue of the pixels in HSL color space.
What does this have to do with matrix multiplication?
Well, depending upon the nature of a partition, we might decide to not partition it further. Why would we not want to? I can think of a few reasons.
- The matrix is an identity matrix.
- The matrix is sparse.
- The matrix is a permutation matrix (permutation matrices are a class to which identity matrices belong to).
Some of these cases can be computed more efficiently than actually performing a matrix multiplication. Thus, these above conditions basically take the place of “variance” which we used in the image example.
The logic for matrix multiplication is then:
1) If one (or both) submatrix satisfies a “efficient calculation” criterion, calculate the product (hopefully, efficiently). Exit current context.
2) If a submatrix is small enough for its product to be calculated, calculate the product. Exit current context.
2) If both the submatrices is still too big, partition them into 4 partitions.
3) Go to step 1 for each partition.
Now to write some code to demonstrate this. But, before that, I probably need to enumerate which conditions can be easily identified for efficient calculation of matrix products. Now that is an entirely different story…