Applications of Linear Algebra in Image Filters [Part II]- Separable Filters
This is [Part II] of a 2 part post. Please read [Part 1] for more clarity.
In [Part I] of the blog post, we have seen how different linear operations, when applied on images, changes their properties. But certain filters like edge detection, blurring, sharpening, feature detector, etc., make use of convolution. Convolution is a computationally heavy process. But we have Linear Algebra to-the-rescue. We are going to use Separable Filters to reduce the computational complexity of applying complex filters. This is achieved by applying an LA method called Low Rank Approximation through Singular Value Decomposition.
1: Singular Value Decomposition:
Singular value decomposition takes a rectangular matrix defined as A(where A is a n x p matrix) and decomposes it into a product of 3 matrices. The SVD theorem states:
where U is an nxn unitary matrix, S is an nxp rectangular diagonal matrix with non-negative real numbers on the diagonal, and V is a pxp real or complex unitary matrix. Calculating the SVD consists of finding the eigenvalues and eigenvectors of A(A^T)(read as: Transpose) and (A^T)A. The eigenvectors of (A^T)A make up the columns of V, the eigenvectors of A(A^T) make up the columns of U. Also, the singular values in S are square roots of eigenvalues from AA^T or A^TA. The singular values are the diagonal entries of the S matrix and are arranged in descending order. The singular values are always real numbers. If the matrix A is a real matrix, then U and V are also real. To understand how to solve for SVD, let’s take the example of the matrix:
So, to find the eigenvalues of the above entity we compute matrices AA^T and A^TA. As previously stated, the eigenvectors of AA^T make up the columns of U, so we can do the following analysis to find U.
. Now that we have a n x n matrix, we can determine the eigenvalues of the matrix W. Since W x = l x then: (W- λI) x = 0
. For a unique set of eigenvalues, determinant of the matrix (W-λI) must be equal to zero. Thus, from the solution of the characteristic equation, |W-λI|=0 we obtain: l=0, l=0; l = 29.883; l = 0.117 (four eigenvalues since it is a fourth-degree polynomial). This value can be used to determine the eigenvector that can be placed in the columns of U. Thus we obtain the following equations:
19.883(x1) + 14(x2) = 0
14(x1) + 9.883(x2) = 0
x3 = 0
x4 = 0
Upon simplifying the first two equations we obtain a ratio which relates the value of x1 to x2. The values of x1 and x2 are chosen such that the elements of the S are the square roots of the eigenvalues. Thus a solution that satisfies the above equation x1 = -0.58 and x2 = 0.82 and x3 = x4 = 0 (this is the second column of the U matrix).
Substituting the other eigenvalue, we obtain:
-9.883 x1 + 14 x2 = 0
14 x1–19.883 x2 = 0
x3 = 0
x4 = 0
Thus, a solution that satisfies this set of equations is x1 = 0.82 and x2 = -0.58 and x3 = x4 = 0 (this is the first column of the U matrix). Combining these we obtain:
Similarly A^TA makes up the columns of V so we can do a similar analysis to find the value of V.
And similarly we obtain the expression:
. Finally as mentioned previously the S is the square root of the eigenvalues from AAT or ATA. So it can be obtained directly giving us:
2: Separable Filters:
Let’s say we want to filter an image — sharpen it, blur, maybe detect the edges or other features. Given a 2D image filter of size MxN, computing the filter would require MxN independent, sequential memory accesses accompanied by MxN multiply-add operations. For large filters, this can get easily prohibitively expensive, and we get quadratic scaling with the filter’s spatial extension. This is where separable filters can come to the rescue. If a filter is separable, we can decompose such filter into a sequence of two 1D filters in different directions (usually horizontal, and then vertical). Each pass filters with a 1D filter, first with M, and then the second pass with N taps, in total M+N operations.
A square filter kernel, of let’s say size L×L, you’d need to convolve that with the picture — which gives you N×M pixels, each needing L² multiply-accumulates. So, you end up with A(2D)= (L*L)MN operations.
Now, if you can decompose that filter into an L-sized horizontal and an L-sized vertical 1D-filter, you could first do all rows — that’s M values per row, each needing L operations, so LMN for all rows — and then you’d do the same with the vertical filter, so LNM for all columns — and you end up with A(1D)=2LMN
This requires storing the intermediate results — either in memory, or locally. While you pay the cost of storing the intermediate results and synchronizing the passes, you get linear and not quadratic scaling. Therefore, typically for any filter sizes larger than ~4×4 (depends on the hardware, implementation etc.) using separable filters is going to be significantly faster than the naive, non-separable approach.
Let’s start with our two simple examples that we know are separable — Box and Gaussian filter. These filters are used to blur an image. The plot of the box and Gaussian kernel looks like below:
Let’s see what SVD will do with the box filter matrix and print out all the singular values, and the U and E row / column associated with the first singular value:
Singular values: [34.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00]
U= [0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00]
V= [0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 -0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00]
Here, we have only 1 singular value because we know the filter is fully separable. The fact that our columns / vectors are negative is a bit strange, but the signs negate each other. So, we can simply multiply both of them by -1. We can also get rid of the singular value and embed it into our filters by multiplying both by square root of it to make them comparable in value:
Vertical filter: [-0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00]
Horizontal filter: [-0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00]
The above two 50x1 matrices are the separable filters that can be applied to an image to get the same result as a 50x50 kernel with comparably less processing.
3: Non-Separable Filters:
In the above example of box filter, we knew it was separable i.e., had only 1 singular value. But what about complex, non-symmetric kernels?? These kernels will have more than 1 singular value. Let’s consider the circular kernel, it is widely used in bokeh filters.
When we decompose this matrix using SVD, we get 50 singular values (the kernel is 50x50). But this isn’t helpful to decrease computation. So, we can perform low rank approximation of this matrix.
3.1: Low Rank Approximation:
Singular values in the S matrix are arranged in descending order. Low Rank Approximation is a technique where only the first N singular values and associated U and V components are considered as the final separable filter.
If we need to approximate 50×50 filter, then even taking 10 components could be “theoretically” worth it; and for much larger filters it would be also worth in practice. One thing worth noting is that in the case of presented filter, we reach the 100% exact accuracy with just 14 components (as opposed to full 50 singular values!). But let’s look at more practical, 1, 2, 3, 4 component approximations and percentage of approximation.
Even with those maximum four components, we see a clear “diminishing returns” effect, and that the differences get smaller between 3 and 4 components as compared to 2 and 3. This is not surprising, since our singular values are sorted descending, so every next approximation will add less of information.
3.2: Visual effect on Low Rank Approximated Filter:
Let’s look at the difference between an image when normal filter and low rank approximated filters are applied.
There isn‘t much difference between original filter and the rank approximated filters. But the computation required to apply those filters is drastically minimized.
4. Result and Discussion:
We can come to an agreement that low rank approximations of filters are more than sufficient for image processing. Splitting a MxN 2D filter kernel into two Mx1 and Nx1 1D filter is apt for processing. In some cases, there is no point in computing without separation as separation gives 99.9% accuracy even with less than half of the singular values. But, even SVD has a few limitations. Weight of Singular values depend on how symmetrical, or how uniform the filter kernel is. If the filter kernel is just a bunch of random numbers, the singular values can be evenly distributed throughout S matrix. This would drastically increase computation for small filters. And secondly, only vertical and horizontal separability is possible here. But some filters use oriented, orthogonal separability, and non-orthogonal separability to perform better. We also have an advantage while performing Low Rank Approximation. Approximations which clears smaller singular values to zero has contributed to improve the image quality by reducing noise, but the effect depends greatly on characteristics of noise and the image.
5. Conclusion:
1. In the first post, we have clearly seen how powerful linear algebra is. We got to materialize and see how matrix operations perform using images. This provided for a better understanding of basic concepts.
2. We got to learn about filter kernels and convolution. We also saw how Singular Value Decomposition is helpful in minimizing computational complexity regarding processing of image filters.
3. This is just a small application of SVD. SVD and Low Rank Approximations are widely used in Image Compression, Pattern extraction — Mid-ocean ridge topography, data reduction method in machine learning, least squares linear regression and denoising data, etc.
There is lots of scope for future work. We can work on separating non-symmetrical, random matrices with increased accuracy by multiple SVD or a more complex LA decomposition techniques. Computer Vision is a hot topic and the demand is only going to increasing in the future. Basic knowledge of Linear Algebra is a must-know in this domain!!
Thank You!!