We will now turn our attention to solving simultaneous equations. Elimination and substitution are the typical methods we employ to solve simultaneous equations. It turns out matrix multiplication offers another approach we can obtain a solution.
This relies on one property of matrix called matrix inverse. Multiplying a matrix by its inverse will result in an identity matrix.
A−1∗A=I
where A−1 is the inverse of matrix A and I is the identity matrix.
To solve a simultaneous equation A∗r=s for vector r, we can rearrange the equation as follows
A∗rA−1∗(A∗r)r=s=A−1∗s=A−1∗s
Therefore, solution for vector r can be obtained by multiplying A−1 with s.
However, finding matrix inverse is a non-trivial task. There exists one shortcut to calculate matrix inverse if it is a 2 by 2 square matrix.
(acbd)−1=ad−bc1(d−c−ba)(1)
To find inverse of matrix in higher dimensions, QR decomposition could be one approach. But this is out of scope for our discussion here.
Matrix Determinant
One concept closely related to matrix inverse is matrix determinant. For a matrix A=(acbd), we can draw a parallelogram with vectors (ac) and (bd). The determinant of this matrix is then defined as the area of this parallelogram.
This area is calculated by formula below.
∣A∣=(a+b)∗(c+d)−ac−bd−2bc=ad−bc
The symbol for determinant is two vertical bars (|), just like the modulus operator for vectors.
Recall that ad−bc is also the term we used in matrix inverse calculation for a 2 by 2 matrix (equation 1). Therefore our matrix inverse formula can be simplified as
A−1=∣A∣1∗(d−c−ba)(2)
We can also see that not all matrices are invertible. In order to find the inverse of a matrix, we have to compute its determinant first. However, for a matrix like (1122), the determinant is 1∗2−2∗1=0. It will result in a division by 0 when we substitute the determinant value to equation (2). This is because for a matrix to have non-zero determinant, the basis vectors of that matrix must be linearly independent. In our example 2 by 2 matrix A, (ac) and (bd) must not lie on the same line to have a valid matrix inverse. In the simultaneous equation A∗r=s, there exists an infinite number of solutions for vector r if matrix A has no inverse.
Matrices Changing Basis
Changing Basis in General
We are going to revisit the topic of changing basis here after we have grasped the concept of matrix transformation on vectors.
First, let’s define 2 new basis vectors b1 and b2 where b1=(31) and b2=(11). Recall from our matrix transformation. The new basis vectors b1 and b2 are in fact the transformation of basis vectors e1=(10) and e2=(01) by matrix (3111). Note b1 and b2 can also be expressed in the basis of e1 and e2.
Now we have a vector r that is defined in b1 and b2 basis as r=23b1+21b2. How do we get the same vector r expressed in e1 and e2 basis? We can substitute in vectors b1 and b2 in e1 and e2 basis.
rE=23b1+21b2=23(31)+21(11)=(52)
Alternatively, we can convert the vector r=23b1+21b2 from b1 and b2 basis to e1 and e2 basis by multiplying the transformation matrix (3111).
rE=(3111)∗(2321)=(52)
This is illustrated in graph below. Note the expression for vectors in e1 and e2 basis is colored black while that for vectors in b1 and b2 basis is colored orange.
That is how we convert a vector from b1 and b2 basis to e1 and e2 basis. But what is more interesting is to convert a vector from e1 and e2 basis to b1 and b2 basis. This should somehow “reverse” our previous process.
We first need to find out where e1 and e2 are in b1 and b2 basis. This is where matrix inverse comes into play.
(3111)−1=3−11(1−1−13)=21(1−1−13)
Therefore, e1=21b1−21b2 and e2=−21b1+23b2 in b1 and b2 basis. We can verify this by substituting the values of b1 and b2 back to get the original e1 and e2.
This demonstrates a complete loop of vector conversion between 2 sets of basis vectors e1, e2 and b1, b2. We can add the expression for all the discussed vectors in graph below.
To sum it up, we always need to find the matrix representation of current basis vectors in the basis of target vector space. If we want to convert a vector from b1 and b2 basis to e1 and e2 basis, we need to find matrix (3111) which is b1 and b2 in e1 and e2 basis. Conversely, if we want to convert a vector from e1 and e2 basis to b1 and b2 basis, we need to find matrix 21(1−1−13) which is e1 and e2 in b1 and b2 basis. Then we use this matrix to multiply with the vector in current vector space to get the converted vector in target vector space.
Transformation matrix provides us a way to change vector basis in a general case. We have learned previously that when the target basis vectors b1 and b2 are orthogonal to each other, there is an easier approach without computing this transformation matrix. We can obtain the new expression of a vector by projecting it onto target basis vector b1 and b2 directly.
For example, we have our orthogonal target basis vectors b1=21(11) and b2=21(−11). To verify that b1 and b2 are orthogonal to each other, we can compute dot product b1⋅b2=0. A vector r=21(13) originally in e1 and e2 basis can then be converted to b1 and b2 basis as:
So we know how to perform a change in basis when the target basis vectors are orthogonal to each other. We also know how to perform such change in a general case by using matrix inverse. This is a very important technique. We will use it often in solving other matrix related problems.
Doing Transformation in a Changed Basis
There is one more extension to our matrices changing basis concept. Going back to our previous example of new basis vectors b1=(31), b2=(11). For the vector r=(2321) in b1 and b2 basis, we want to find a vector r′ that is the result of rotating r by 90° anti-clockwise. The difficult part is this rotation happens not in the original e1 and e2 basis, but is referenced to b1 and b2 basis. How shall we do that?
We might not know how to express the 90° anti-clockwise rotation transformation matrix in b1 and b2 basis. Nonetheless, we know how to do this rotation in our original basis vector e1 and e2. This rotation transformation matrix is given by TE=(01−10) (recall from our previous discussion on matrix transformation). So here is what we can do to accomplish our goal.
We first convert the vector r into e1 and e2 basis by transformation matrix B=(3111).
rE=B∗r=(3111)∗(2321)=(52)
Then we perform a rotation transformation for vector rE in the e1 and e2 vector space.
rE′=TE∗rE=(01−10)∗(52)=(−25)
Lastly we convert rE′ back to the b1 an b2 basis with matrix inverse B−1.
Therefore, the original vector r=(2321) in b1 and b2 basis after being rotated by 90° anti-clockwise becomes r′=(−27217) in the same basis. This is plotted in graph below.
In general, if we have our basis changing matrix B and desired transformation matrix T in normal basis, we can perform a linear transformation in the changed basis from vector r to r′ by: r′=B−1∗T∗B∗r
This idea of doing a transformation in a changed basis might be hard to grasp. But it is a critical step to set us up for further machine learning concept. For example, Principle Component Analysis (PCA) will often make use of different basis vectors. It would be helpful if you can think over this entire process and have a solid understanding about it.
Orthogonal Matrices
Define an Orthogonal Matrix
In order to define an orthogonal matrix, we need to first introduce the concept of matrix transpose. If we interchange all the elements of the rows and columns in a matrix, the resultant matrix is called the transpose of original matrix. We denote this operation by symbol t.
For example, we have a matrix A=(1324). The transpose of A is therefore At=(1234)
where elements that are off the diagonal are interchanged.
Let’s define a special n by n square matrix A, where A=((a1)(a2)⋯(an))
Each column ai in A is a vector perpendicular to other column vectors aj, ai⋅aj=0∀i=j. And all the column vectors have unit length 1, ∣ai∣=1.
Now something interesting happens. We can multiply matrix A by its transpose matrix At,
We get an n by n identity matrix as a result. This means At is also an inverse of A.
Therefore, we can define an orthogonal matrix as one consisting of a set of unit length basis vectors that are all perpendicular to each other.
Since all the basis vectors are of length 1, the determinant of an orthogonal matrix must be either +1 or -1. We can derive the matrix determinant below.
1=det(I)=det(At∗A)=det(At)∗det(A)=det(A)2
Therefore,
det(A)=±1
Whether the determinant is +1 or -1 depends on how we permute the column vectors in A, but this is beyond our discussion here.
We can derive another property of orthogonal matrix, making use of the fact that At is inverse of A.
The multiplication of rows of A and columns of At is still an identity matrix. This shows that the row vectors of matrix A must also be perpendicular to each other. Therefore, At is also an orthogonal matrix.
Recall from our previous discussion on changing basis, the vector in a new vector space can be easily computed by vector projection provided the new basis vectors are all perpendicular to each other. This is exactly what we get here with orthogonal matrix. Each column in the orthogonal matrix can be treated as a basis vector and they are perpendicular to each other. Since the unit length is 1, result of vector projection is simply the dot product (∣bi∣r⋅bi∗∣bi∣1=r⋅bi,i∈1,2,⋯,n).
Therefore in data science we would like to use an orthogonal matrix as the basis vector set for transforming our data. There are a few advantages of doing so.
Matrix inverse can be computed easily because A−1=At.
Matrix transformation is reversible because it does not collapse space (det(A)=1).
Changing basis can be computed easily with vector projection.
How to Construct an Orthogonal Matrix
We already know that it’s convenient if our computation involves orthogonal matrix. But how do we get an orthogonal matrix? We are going to walk through the process of constructing an orthogonal matrix here. This process is called Gram-Schmidt process.
We first define a set of vectors V={v1,v2,⋯,vn} in which all vectors are linearly independent of each other (verify by computing the determinants of every pair of vectors). However, they are neither perpendicular to each other nor of unit length at this stage. We are going to construct a set of orthogonal basis vectors out of this vector set V.
We start with the vector v1 and define our first basis vector e1 as
e1=∣v1∣v1
So e1 is just the normalized vector of v1 (with length 1).
For the second vector v1, we can treat it as the sum of two vectors. One is in the same direction as e1 and the other one is perpendicular to e1 (v2=e1∥+e1⊥). To find the vector in the same direction as e1, we do a projection of v2 onto e1.
e1∥=∣e1∣v2⋅e1∗∣e1∣e1
Since e1 has length 1, ∣e1∣=1.
e1∥=(v2⋅e1)∗e1
Then the vector perpendicular to e1 can be calculated by subtracting the parallel component (v2⋅e1)e1 from v2. We denote this vector as u2. So
u2=v2−(v2⋅e1)∗e1
The second basis vector e2 is thus obtained by normalizing u2.
e2=∣u2∣u2
Let’s move on to the third vector v3. To find the vector u3 that is perpendicular to both e1 and e2, we subtract from v3 the respective components of v3 which are in the same direction as e1 and e2.
u3=v3−(v3⋅e1)e1−(v3⋅e2)e2
Then we find the basis vector e3 by normalizing u3.
e3=∣u3∣u3
We can continue the same process for more vectors v4, v5, … vn until we find a set of vectors are all perpendicular to each other and with unit length 1. This set of vectors then forms the orthogonal matrix we need.
Let’s put together everything we have learned so far with a concrete example.
We have 3 vectors, v1=⎝⎛111⎠⎞, v2=⎝⎛201⎠⎞ and v3=⎝⎛31−1⎠⎞ that define a 3-D space (verify these three vectors are linearly independent). There is a vector r=⎝⎛235⎠⎞ lying in the same space defined by v1, v2 and v3. Our task is to find a vector r′ that is a mirror reflection of vector r by the 2-D plane defined by vectors v1 and v2.
This problem seems complicated because it is very hard to find a transformation matrix that reflects a vector by v1 and v2 plane directly. However, making use of what we have already learned this problem can be broken down into these steps:
Define an orthogonal matrix E consisting of basis vectors e1, e2 and e3 from v1, v2 and v3.
Convert vector r to rE in a new vector space defined by E.
Perform mirror reflection of rE in this new space to get the transformed vector rE′.
Convert rE′ back to the original space as r′.
We start by creating the orthogonal matrix E with Gram-Schmidt process.
The first basis vector e1 is just the normalized v1.
In the basis of e1, e2 and e3, a reflection matrix can be defined as TE=⎝⎛10001000−1⎠⎞ because the values in e1 and e2 directions remain the same and those in e3 direction is inverted.
Let’s do the reflection of rE by transformation matrix TE.
This wraps up our discussion on matrix inverse and matrix transformation. It is really fun and useful. We can apply this technique to a lot of image related problems in machine learning which requires transformation of shape, orientation, position, etc. It also sets us up for the next topic of Eigen vectors and Eigen values.
(Inspired by Mathematics for Machine Learning lecture series from Imperial College London)