3. Review ofMatrix Algebra
Vectors and matrices are essential for modern analysis of systems of equations —
algebrai, differential, functional, etc. In this part, we will review the most basic facts of
matrix arithmetic. See [42] for full details.
3.1. Matrices and Vectors.
A matrix is a rectangular array of numbers. Thus,
1 0 3
−2 4 1
,
0
e 1
2
− 1 .83
√5 −4
7
, ( .2 −1.6 .32 ),
0
0
,
1 3
−2 5
,
are all examples of matrices. We use the notation
A =
a11 a12 . . . a1n
a21 a22 . . . a2n
...
...
. . .
...
am1 am2 . . . amn
(3.1)
for a general matrix of size m×n (read “m by n”), where m denotes the number of rows in
A and n denotes the number of columns. Thus, the preceding examples of matrices have
respective sizes 2 × 3, 4 × 2, 1 × 3, 2 × 1, and 2 × 2. A matrix is square if m = n, i.e., it
has the same number of rows as columns. A column vector is a m×1 matrix, while a row
vector is a 1×n matrix. As we shall see, column vectors are by far the more important of
the two, and the term “vector” without qualification will always mean “column vector”.
A 1 × 1 matrix, which has but a single entry, is both a row and a column vector.
The number that lies in the ith row and the jth column of A is called the (i, j) entry
of A, and is denoted by aij . The row index always appears first and the column index
second. Two matrices are equal, A = B, if and only if they have the same size, and all
their entries are the same: aij = bij for i = 1, . . . ,m and j = 1, . . . , n.
5/18/08 35 c
2008 Peter J. Olver
A general linear system of m equations in n unknowns will take the form
a11 x1 + a12 x2 + · · · + a1n xn = b1,
a21 x1 + a22 x2 + · · · + a2n xn = b2,
...
...
...
am1 x1 + am2 x2 + · · · + amn xn = bm.
(3.2)
As such, it is composed of three basic ingredients: the m × n coefficient matrix A, with
entries aij as in (3.1), the column vector x =
x1
x2
...
xn
containing the unknowns, and the
column vector b =
b1
b2
...
bm
containing right hand sides. As an example, consider the linear
system
x + 2y + z = 2,
2y + z = 7,
x + y + 4z = 3,
The coefficient matrix A =
1 2 1
0 2 1
1 1 4
can be filled in, entry by entry, from the coef-
ficients of the variables appearing in the equations. (Don’t forget to put a zero when a
avariable doesn’t appear in an equation!) The vector x =
x
y
z
lists the variables, while
the entries of b =
2
7
3
are the right hand sides of the equations.
Remark: We will consistently use bold face lower case letters to denote vectors, and
ordinary capital letters to denote general matrices.
Matrix Arithmetic
Matrix arithmetic involves three basic operations: matrix addition, scalar multiplica-
tion, and matrix multiplication. First we define addition of matrices. You are only allowed
to add two matrices of the same size, and matrix addition is performed entry by entry.
For example,
1 2
−1 0
+
3 −5
2 1
=
4 −3
1 1
.
Therefore, if A and B are m×n matrices, their sum C = A+B is the m×n matrix whose
entries are given by cij = aij +bij for i = 1, . . . ,m and j = 1, . . . , n. When defined, matrix
5/18/08 36 c
2008 Peter J. Olver
addition is commutative, A + B = B + A, and associative, A + (B + C) = (A + B) + C,
just like ordinary addition.
A scalar is a fancy name for an ordinary number — the term merely distinguishes it
from a vector or a matrix. For the time being, we will restrict our attention to real scalars
and matrices with real entries, but eventually complex scalars and complex matrices must
be dealt with. We will consistently identify a scalar c ∈ R with the 1 × 1 matrix ( c ) in
which it is the sole entry, and so will omit the redundant parentheses in the latter case.
Scalar multiplication takes a scalar c and an m × n matrix A and computes the m × n
matrix B = cA by multiplying each entry of A by c. For example,
3
1 2
−1 0
=
3 6
−3 0
.
In general, bij = c aij for i = 1, . . . ,m and j = 1, . . . , n. Basic properties of scalar
multiplication are summarized at the end of this section.
Finally, we define matrix multiplication. First, the product between a row vector a
and a column vector x having the same number of entries is the scalar or 1 × 1 matrix
defined by the following rule:
a x = ( a1 a2 . . . an )
x1
x2
...
xn
= a1 x1 + a2 x2 + · · · + an xn =
Xn
k=1
ak xk. (3.3)
More generally, if A is an m × n matrix and B is an n × p matrix, so that the number of
columns in A equals the number of rows in B, then the matrix product C = AB is defined
as the m × p matrix whose (i, j) entry equals the vector product of the ith row of A and
the jth column of B. Therefore,
cij =
Xn
k=1
aik bkj . (3.4)
Note that our restriction on the sizes of A and B guarantees that the relevant row and
column vectors will have the same number of entries, and so their product is defined.
For example, the product of the coefficient matrix A and vector of unknowns x for
our original system (4.1) is given by
Ax =
1 2 1
2 6 1
1 1 4
x
y
z
=
x + 2y + z
2x + 6y + z
x + y + 4z
.
The result is a column vector whose entries reproduce the left hand sides of the original
linear system! As a result, we can rewrite the system
Ax = b (3.5)
as an equality between two column vectors. This result is general; a linear system (3.2)
consisting of m equations in n unknowns can be written in the matrix form (3.5) where A
5/18/08 37 c
2008 Peter J. Olver
is the m×n coefficient matrix (3.1), x is the n×1 column vector of unknowns, and b is the
m × 1 column vector containing the right hand sides. This is one of the principal reasons
for the non-evident definition of matrix multiplication. Component-wise multiplication of
matrix entries turns out to be almost completely useless in applications.
Now, the bad news. Matrix multiplication is not commutative — that is, BA is not
necessarily equal to AB. For example, BA may not be defined even when AB is. Even if
both are defined, they may be different sized matrices. For example the product s = r c
of a row vector r, a 1 × n matrix, and a column vector c, an n × 1 matrix with the same
number of entries, is a 1 × 1 matrix or scalar, whereas the reversed product C = c r is an
n × n matrix. For instance,
( 1 2 )
3
0
= 3, whereas
3
0
( 1 2 ) =
3 6
0 0
.
In computing the latter product, don’t forget that we multiply the rows of the first matrix
by the columns of the second. Moreover, even if the matrix products AB and BA have
the same size, which requires both A and B to be square matrices, we may still have
AB 6= BA. For example,
1 2
3 4
0 1
−1 2
=
−2 5
−4 11
6=
3 4
5 6
=
0 1
−1 2
1 2
3 4
.
On the other hand, matrix multiplication is associative, so A(BC) = (AB)C when-
ever A has size m × n, B has size n × p, and C has size p × q; the result is a matrix of
size m × q. The proof of associativity is a tedious computation based on the definition of
matrix multiplication that, for brevity, we omit. Consequently, the one difference between
matrix algebra and ordinary algebra is that you need to be careful not to change the order
of multiplicative factors without proper justification.
Since matrix multiplication acts by multiplying rows by columns, one can compute the
columns in a matrix product AB by multiplying the matrix A and the individual columns
of B. For example, the two columns of the matrix product
1 −1 2
2 0 −2
3 4
0 2
−1 1
=
1 4
8 6
are obtained by multiplying the first matrix with the individual columns of the second:
1 −1 2
2 0 −2
3
0
−1
=
1
8
,
1 −1 2
2 0 −2
4
2
1
=
4
6
.
In general, if we use bk to denote the kth column of B, then
AB = A
b1 b2 . . . bp
=
Ab1 Ab2 . . . Abp
, (3.6)
indicating that the kth column of their matrix product is Abk.
There are two special matrices. The first is the zero matrix , all of whose entries are 0.
We use Om×n to denote the m× n zero matrix, often written as just O if the size is clear
5/18/08 38 c
2008 Peter J. Olver
from the context. The zero matrix is the additive unit, so A + O = A = O + A when O
has the same size as A. In particular, we will use a bold face 0 to denote a column vector
with all zero entries.
The role of the multiplicative unit is played by the square identity matrix
I = In =
1 0 0 · · · 0 0
0 1 0 · · · 0 0
0 0 1 · · · 0 0
...
...
...
. . .
...
...
0 0 0 · · · 1 0
0 0 0 · · · 0 1
of size n × n. The entries along the main diagonal (which runs from top left to bottom
right) are equal to 1, while the off-diagonal entries are all 0. As you can check, if A is any
m × n matrix, then Im A = A = A In . We will sometimes write the preceding equation
as just IA = A = A I , since each matrix product is well-defined for exactly one size of
identity matrix.
The identity matrix is a particular example of a diagonal matrix . In general, a square
matrix A is diagonal if all its off-diagonal entries are zero: aij = 0 for all i 6= j. We will
sometimes write D = diag (c1, . . . , cn) for the n × n diagonal matrix with diagonal entries
dii = ci. Thus, diag (1, 3, 0) refers to the diagonal matrix
1 0 0
0 3 0
0 0 0
, while the n × n
identity matrix can be written as In = diag (1, 1, . . . , 1).
Let us conclude this section by summarizing the basic properties of matrix arithmetic.
In the accompanying table, A,B,C are matrices; c, d are scalars; O is a zero matrix; and
I is an identity matrix. All matrices are assumed to have the correct sizes so that the
indicated operations are defined.
Matrix Inverses
The inverse of a matrix is analogous to the reciprocal a−1 = 1/a of a scalar. We
already encountered the inverses of matrices corresponding to elementary row operations.
In this section, we will study inverses of general square matrices. We begin with the formal
definition.
Definition 3.1. Let A be a square matrix of size n×n. An n×n matrix X is called
the inverse of A if it satisfies
XA = I = AX, (3.7)
where I = I n is the n×n identity matrix. The inverse is commonly denoted by X = A−1.
Remark: Noncommutativity of matrix multiplication requires that we impose both
conditions in (3.7) in order to properly define an inverse to the matrix A. The first
condition, XA = I , says that X is a left inverse, while the second, AX = I , requires that
X also be a right inverse. Rectangular matrices might have either a left inverse or a right
inverse, but, as we shall see, only square matrices have both, and so only square matrices
5/18/08 39 c
2008 Peter J. Olver
Basic Matrix Arithmetic
Matrix Addition: Commutativity A + B = B + A
Associativity (A + B) + C = A + (B + C)
Zero Matrix A + O = A = O + A
Inverse A + (−A) = O, −A = (−1)A
Scalar Multiplication: Associativity c (dA) = (cd)A
Distributivity
c (A + B) = (cA) + (cB)
(c + d)A = (cA) + (dA)
Unit 1A = A
Zero 0A = O
Matrix Multiplication: Associativity (AB)C = A(BC)
Distributivity
A(B + C) = AB + AC,
(A + B)C = AC + BC,
Identity Matrix A I = A = IA
Zero Matrix AO = O, OA = O
can have full-fledged inverses. However, not every square matrix has an inverse. Indeed,
not every scalar has an inverse: 0−1 = 1/0 is not defined since the equation 0x = 1 has no
solution.
Example 3.2. Since
1 2 −1
−3 1 2
−2 2 1
3 4 −5
1 1 −1
4 6 −7
=
1 0 0
0 1 0
0 0 1
=
3 4 −5
1 1 −1
4 6 −7
1 2 −1
−3 1 2
−2 2 1
,
we conclude that when A =
1 2 −1
−3 1 2
−2 2 1
, then A−1 =
3 4 −5
1 1 −1
4 6 −7
. Observe that
there is no obvious way to anticipate the entries of A−1 from the entries of A.
Example 3.3. Let us compute the inverse X =
x y
z w
, when it exists, of a general
2 × 2 matrix A =
a b
c d
. The right inverse condition
AX =
a x + b z a y + bw
c x + d z c y + dw
=
1 0
0 1
= I
5/18/08 40 c
2008 Peter J. Olver
holds if and only if x, y, z,w satisfy the linear system
a x + b z = 1,
c x + d z = 0,
a y + bw = 0,
c y + dw = 1.
Solving by Gaussian Elimination (or directly), we find
x =
d
ad − b c
, y = −
b
ad − b c
, z = −
c
ad − b c
, w =
a
ad − b c
,
provided the common denominator ad − b c 6= 0 does not vanish. Therefore, the matrix
X =
1
ad − b c
d −b
−c a
forms a right inverse to A. However, a short computation shows that it also defines a left
inverse:
XA =
x a + y c x b + y d
z a + w c z b + w d
=
1 0
0 1
= I ,
and hence X = A−1 is the inverse to A.
The denominator appearing in the preceding formulae has a special name; it is called
the determinant of the 2 × 2 matrix A, and denoted
det
a b
c d
= ad − b c. (3.8)
Thus, the determinant of a 2 × 2 matrix is the product of the diagonal entries minus the
product of the off-diagonal entries. Thus, the 2 × 2 matrix A is invertible, with
A−1 =
1
ad − b c
d −b
−c a
, (3.9)
if and only if detA 6= 0. For example, if A =
1 3
−2 −4
, then detA = 2 6= 0. We
conclude that A has an inverse, which, by (3.9), is A−1 =
1
2
−4 −3
2 1
=
− 2 −3
2
1 1
2
!
.
Lemma 3.4. The inverse of a square matrix, if it exists, is unique.
Proof : Suppose both X and Y satisfy (3.7), so XA = I = AX and Y A = I = AY .
Then, by associativity, X = X I = X(AY ) = (XA)Y = I Y = Y , and hence X =
Y . Q.E.D.
Inverting a matrix twice brings us back to where we started.
Lemma 3.5. If A is invertible, then A−1 is also invertible and (A−1)−1 = A.
Proof : The matrix inverse equations A−1 A = I = AA−1 are sufficient to prove that
A is the inverse of A−1. Q.E.D.
5/18/08 41 c
2008 Peter J. Olver
Lemma 3.6. If A and B are invertible matrices of the same size, then their product,
AB, is invertible, and
(AB)−1 = B−1A−1. (3.10)
Note that the order of the factors is reversed under inversion.
Proof : Let X = B−1A−1. Then, by associativity,
X(AB) = B−1A−1AB = B−1B = I , (AB)X = ABB−1A−1 = AA−1 = I .
Thus X is both a left and a right inverse for the product matrix AB and the result
follows. Q.E.D.
Example 3.7. One verifies, directly, that the inverse of A =
1 2
0 1
is A−1 =
1 −2
0 1
, while the inverse of B =
0 1
−1 0
is B−1 =
0 −1
1 0
. Therefore, the
inverse of their product C = AB =
1 2
0 1
0 1
−1 0
=
−2 1
−1 0
is given by C−1 =
B−1A−1 =
0 −1
1 0
1 −2
0 1
=
0 −1
1 −2
.
We can straightforwardly generalize the preceding result. The inverse of a k-fold
product of invertible matrices is the product of their inverses, in the reverse order :
(A1A2 · · ·Ak−1Ak)−1 = A−1
k A−1
k−1 · · ·A−1
2 A−1
1 . (3.11)
Warning: In general, (A + B)−1 6= A−1 + B−1. This equation is not even true for
scalars (1 × 1 matrices)!
Transposes and Symmetric Matrices
Another basic operation on matrices is to interchange their rows and columns. If A
is an m×n matrix, then its transpose, denoted AT , is the n×m matrix whose (i, j) entry
equals the (j, i) entry of A; thus
B = AT means that bij = aji.
For example, if
A =
1 2 3
4 5 6
, then AT =
1 4
2 5
3 6
.
Observe that the rows of A become the columns of AT and vice versa. In particular, the
transpose of a row vector is a column vector, while the transpose of a column vector is a
row vector; if v =
1
2
3
, then vT = ( 1 2 3 ). The transpose of a scalar, considered as a
1 × 1 matrix, is itself: cT = c.
5/18/08 42 c
2008 Peter J. Olver
Remark: Most vectors appearing in applied mathematics are column vectors. To
conserve vertical space in this text, we will often use the transpose notation, e.g., v =
( v1, v2, v3 )T , as a compact way of writing column vectors.
In the square case, transposition can be viewed as “reflecting” the matrix entries
across the main diagonal. For example,
1 2 −1
3 0 5
−2 −4 8
T
=
1 3 −2
2 0 −4
−1 5 8
.
In particular, the transpose of a lower triangular matrix is upper triangular and vice-versa.
Transposing twice returns you to where you started:
(AT )T = A. (3.12)
Unlike inversion, transposition is compatible with matrix addition and scalar multiplica-
tion:
(A + B)T = AT + BT , (cA)T = cAT . (3.13)
Transposition is also compatible with matrix multiplication, but with a twist. Like the
inverse, the transpose reverses the order of multiplication:
(AB)T = BTAT . (3.14)
Indeed, if A has size m × n and B has size n × p, so they can be multiplied, then AT has
size n × m and BT has size p × n, and so, in general, one has no choice but to multiply
BTAT in that order. Formula (3.14) is a straightforward consequence of the basic laws of
matrix multiplication. An important special case is the product between a row vector vT
and a column vector w with the same number of entries. In this case,
vTw = (vTw)T = wTv, (3.15)
because their product is a scalar and so, as noted above, equals its own transpose.
Lemma 3.8. If A is a nonsingular matrix, so is AT , and its inverse is denoted
A−T = (AT )−1 = (A−1)T . (3.16)
Thus, transposing a matrix and then inverting yields the same result as first inverting and
then transposing.
Proof : Let X = (A−1)T . Then, according to (3.14),
XAT = (A−1)TAT = (AA−1)T = I T = I .
The proof that AT X = I is similar, and so we conclude that X = (AT )−1. Q.E.D.
A particularly important class of square matrices is those that are unchanged by the
transpose operation.
5/18/08 43 c
2008 Peter J. Olver
Definition 3.9. A square matrix is called symmetric if it equals its own transpose:
A = AT .
Thus, A is symmetric if and only if its entries satisfy aji = aij for all i, j. In other
words, entries lying in “mirror image” positions relative to the main diagonal must be
equal. For example, the most general symmetric 3 × 3 matrix has the form
A =
a b c
b d e
c e f
.
Note that any diagonal matrix, including the identity, is symmetric. A lower or upper
triangular matrix is symmetric if and only if it is, in fact, a diagonal matrix.
5/18/08 44 c
2008 Peter J. Olver
woo
ReplyDelete