The Little Book of Linear Algebra

A concise, beginner-friendly introduction to the core ideas of linear algebra.

Source: the-litte-book-of/linear-algebra

Online:

Formats

Chapter 1. Vectors

1.1 Scalars and Vectors

A scalar is a single numerical quantity, most often taken from the real numbers, denoted by R. Scalars are the fundamental building blocks of arithmetic: they can be added, subtracted, multiplied, and, except in the case of zero, divided. In linear algebra, scalars play the role of coefficients, scaling factors, and entries of larger structures such as vectors and matrices. They provide the weights by which more complex objects are measured and combined. A vector is an ordered collection of scalars, arranged either in a row or a column. When the scalars are real
numbers, the vector is said to belong to real n-dimensional space, written

Rn={(x1,x2,,xn)xiR}.

An element of Rn is called a vector of dimension n or an n-vector. The number n is called the dimension of the vector space. Thus R2 is the space of all ordered pairs of real numbers, R3 the space of all ordered triples, and so on.

Example 1.1.1.

Vectors are often written vertically in column form, which emphasizes their role in matrix multiplication:

v=[205]R3.

The vertical layout makes the structure clearer when we consider linear combinations or multiply matrices by vectors.

Geometric Interpretation

In R2, a vector (x1,x2) can be visualized as an arrow starting at the origin (0,0) and ending at the point (x1,x2). Its length corresponds to the distance from the origin, and its orientation gives a direction in the plane. In R3, the same picture extends into three dimensions: a vector is an arrow from the origin to (x1,x2,x3). Beyond three dimensions, direct visualization is no longer possible, but the algebraic rules of vectors remain identical. Even though we cannot draw a vector in R10, it behaves under addition, scaling, and transformation exactly as a 2or 3-dimensional vector does. This abstract point of view is what allows linear algebra to apply to data science, physics, and machine learning, where data often lives in very high-dimensional spaces.

Thus a vector may be regarded in three complementary ways:

  1. As a point in space, described by its coordinates.
  2. As a displacement or arrow, described by a direction and a length.
  3. As an abstract element of a vector space, whose properties follow algebraic rules independent of geometry.

Notation

Why begin here?

Scalars and vectors form the atoms of linear algebra. Every structure we will build-vector spaces, linear transformations, matrices, eigenvalues-relies on the basic notions of number and ordered collection of numbers. Once vectors are understood, we can define operations such as addition and scalar multiplication, then generalize to subspaces, bases, and coordinate systems. Eventually, this framework grows into the full theory of linear algebra, with powerful applications to geometry, computation, and data.

Exercises 1.1

  1. Write three different vectors in R2 and sketch them as arrows from the origin. Identify their coordinates explicitly.
  2. Give an example of a vector in R4. Can you visualize it directly? Explain why high-dimensional visualization is challenging.
  3. Let v=(4,3,2). Write v in column form and state v1,v2,v3.
  4. In what sense is the set R1 both a line and a vector space? Illustrate with examples.
  5. Consider the vector u=(1,1,,1)Rn. What is special about this vector when n is large? What might it represent in applications?

1.2 Vector Addition and Scalar Multiplication

Vectors in linear algebra are not static objects; their power comes from the operations we can perform on them. Two fundamental operations define the structure of vector spaces: addition and scalar multiplication. These operations satisfy simple but far-reaching rules that underpin the entire subject.

Vector Addition

Given two vectors of the same dimension, their sum is obtained by adding corresponding entries. Formally, if

u=(u1,u2,,un),v=(v1,v2,,vn),

then their sum is

u+v=(u1+v1,u2+v2,,un+vn).

Example 1.2.1.
Let u=(2,1,3) and v=(4,0,5). Then

u+v=(2+4,1+0,3+(5))=(6,1,2).

Geometrically, vector addition corresponds to the parallelogram rule. If we draw both vectors as arrows from the origin, then placing the tail of one vector at the head of the other produces the sum. The diagonal of the parallelogram they form represents the resulting vector.

Scalar Multiplication

Multiplying a vector by a scalar stretches or shrinks the vector while preserving its direction, unless the scalar is negative, in which case the vector is also reversed. If cR and

v=(v1,v2,,vn),

then

cv=(cv1,cv2,,cvn).

Example 1.2.2.
Let v=(3,2) and c=2. Then

cv=2(3,2)=(6,4).

This corresponds to flipping the vector through the origin and doubling its length.

Linear Combinations

The interaction of addition and scalar multiplication allows us to form linear combinations. A linear combination of vectors v1,v2,,vk is any vector of the form

c1v1+c2v2++ckvk,ciR.

Linear combinations are the mechanism by which we generate new vectors from existing ones. The span of a set of vectors-the collection of all their linear combinations-will later lead us to the idea of a subspace.

Example 1.2.3.
Let v1=(1,0) and v2=(0,1). Then any vector (a,b)R2 can be expressed as

av1+bv2.

Thus (1,0) and (0,1) form the basic building blocks of the plane.

Notation

Why this matters

Vector addition and scalar multiplication are the defining operations of linear algebra. They give structure to vector spaces, allow us to describe geometric phenomena like translation and scaling, and provide the foundation for solving systems of equations. Everything that follows-basis, dimension, transformations-builds on these simple but profound rules.

Exercises 1.2

  1. Compute u+v where u=(1,2,3) and v=(4,1,0).
  2. Find 3v where v=(2,5). Sketch both vectors to illustrate the scaling.
  3. Show that (5,7) can be written as a linear combination of (1,0) and (0,1).
  4. Write (4,4) as a linear combination of (1,1) and (1,1).
  5. Prove that if u,vRn,
    then (c+d)(u+v)=cu+cv+du+dv for
    scalars c,dR.

1.3 Dot Product, Norms, and Angles

The dot product is the fundamental operation that links algebra and geometry in vector spaces. It allows us to measure lengths, compute angles, and determine orthogonality. From this single definition flow the notions of norm and angle, which give geometry to abstract vector spaces.

The Dot Product

For two vectors in Rn, the dot product (also called the inner product) is defined by

uv=u1v1+u2v2++unvn.

Equivalently, in matrix notation:

uv=uTv.

Example 1.3.1.
Let u=(2,1,3) and v=(4,0,2). Then

uv=24+(1)0+3(2)=86=2.

The dot product outputs a single scalar, not another vector.

Norms (Length of a Vector)

The Euclidean norm of a vector is the square root of its dot product with itself:

v=vv=v12+v22++vn2.

This generalizes the Pythagorean theorem to arbitrary dimensions.

Example 1.3.2.
For v=(3,4),

v=32+42=25=5.

This is exactly the length of the vector as an arrow in the plane.

Angles Between Vectors

The dot product also encodes the angle between two vectors. For nonzero vectors u,v,

uv=uvcosθ,

where θ is the angle between them. Thus,

cosθ=uvuv.

Example 1.3.3.
Let u=(1,0) and v=(0,1). Then

uv=0,u=1,v=1.

Hence

cosθ=011=0θ=π2.

The vectors are perpendicular.

Orthogonality

Two vectors are said to be orthogonal if their dot product is zero:

uv=0.

Orthogonality generalizes the idea of perpendicularity from geometry to higher dimensions.

Notation

Why this matters

The dot product turns vector spaces into geometric objects: vectors gain lengths, angles, and notions of perpendicularity. This foundation will later support the study of orthogonal projections, Gram–Schmidt orthogonalization, eigenvectors, and least squares problems.

Exercises 1.3

  1. Compute uv for u=(1,2,3), v=(4,5,6).
  2. Find the norm of v=(2,2,1).
  3. Determine whether u=(1,1,0) and v=(1,1,2) are orthogonal.
  4. Let u=(3,4), v=(4,3). Compute the angle between them.
  5. Prove that |u+v|2=|u|2+|v|2+2uv. This identity is the algebraic version of the Law of Cosines.

1.4 Orthogonality

Orthogonality captures the notion of perpendicularity in vector spaces. It is one of the most important geometric ideas in linear algebra, allowing us to decompose vectors, define projections, and construct special bases with elegant properties.

Definition

Two vectors u,vRn are said to be orthogonal if their dot product is zero:

uv=0.

This condition ensures that the angle between them is π/2 radians (90 degrees).

Example 1.4.1.
In R2, the vectors (1,2) and (2,1) are orthogonal since

(1,2)(2,1)=12+2(1)=0.

Orthogonal Sets

A collection of vectors is called orthogonal if every distinct pair of vectors in the set is orthogonal. If, in addition, each vector has norm 1, the set is called orthonormal.

Example 1.4.2.
In R3, the standard basis vectors

e1=(1,0,0),e2=(0,1,0),e3=(0,0,1)

form an orthonormal set: each has length 1, and their dot products vanish when the indices differ.

Projections

Orthogonality makes possible the decomposition of a vector into two components: one parallel to another vector, and one orthogonal to it. Given a nonzero vector u and any vector v, the projection of v onto u is

proju(v)=uvuuu.

The difference

vproju(v)

is orthogonal to u. Thus every vector can be decomposed uniquely into a parallel and perpendicular part with respect to another vector.

Example 1.4.3.
Let u=(1,0), v=(2,3). Then

proju(v)=(1,0)(2,3)(1,0)(1,0)(1,0)=21(1,0)=(2,0).

Thus

v=(2,3)=(2,0)+(0,3),

where (2,0) is parallel to (1,0) and (0,3) is orthogonal to it.

Orthogonal Decomposition

In general, if u0 and vRn, then

v=proju(v)+(vproju(v)),

where the first term is parallel to u and the second term is orthogonal. This decomposition underlies methods such as least squares approximation and the Gram–Schmidt process.

Notation

Why this matters

Orthogonality gives structure to vector spaces. It provides a way to separate independent directions cleanly, simplify computations, and minimize errors in approximations. Many powerful algorithms in numerical linear algebra and data science (QR decomposition, least squares regression, PCA) rely on orthogonality.

Exercises 1.4

  1. Verify that the vectors (1,2,2) and (2,0,1) are orthogonal.
  2. Find the projection of (3,4) onto (1,1).
  3. Show that any two distinct standard basis vectors in Rn are orthogonal.
  4. Decompose (5,2) into components parallel and orthogonal to (2,1).
  5. Prove that if u,v are orthogonal and nonzero,
    then (u+v)(uv)=0.

Chapter 2. Matrices

2.1 Definition and Notation

Matrices are the central objects of linear algebra, providing a compact way to represent and manipulate linear transformations, systems of equations, and structured data. A matrix is a rectangular array of numbers arranged in rows and columns.

Formal Definition

An m×n matrix is an array with m rows and n columns, written

A=[a11a12a1na21a22a2nam1am2amn].

Each entry aij is a scalar, located in the i-th row and j-th column. The size (or dimension) of the matrix is denoted by m×n.

Thus, vectors are simply special cases of matrices.

Examples

Example 2.1.1. A 2×3 matrix:

A=[124035].

Here, a12=2, a23=5, and the matrix has 2 rows, 3 columns.

Example 2.1.2. A 3×3 square matrix:

B=[201134052].

This will later serve as the representation of a linear transformation on R3.

Indexing and Notation

Thus, a matrix is a function A:1,,m×1,,nR, assigning a scalar to each row-column position.

Why this matters

Matrices generalize vectors and give us a language for describing linear operations systematically. They encode systems of equations, rotations, projections, and transformations of data. With matrices, algebra and geometry come together: a single compact object can represent both numerical data and functional rules.

Exercises 2.1

  1. Write a 3×2 matrix of your choice and identify its entries aij.
  2. Is every vector a matrix? Is every matrix a vector? Explain.
  3. Which of the following are square
    matrices: AR4×4, BR3×5, CR1×1?
  4. Let D=[1001]. What kind of matrix is this?
  5. Consider the matrix E=[abcd]. Express e11,e12,e21,e22
    explicitly.

2.2 Matrix Addition and Multiplication

Once matrices are defined, the next step is to understand how they combine. Just as vectors gain meaning through addition and scalar multiplication, matrices become powerful through two operations: addition and multiplication.

Matrix Addition

Two matrices of the same size are added by adding corresponding entries. If

A=[aij]Rm×n,B=[bij]Rm×n,

then

A+B=[aij+bij]Rm×n.

Example 2.2.1.
Let

A=[1234],B=[1052].

Then

A+B=[1+(1)2+03+54+2]=[0286].

Matrix addition is commutative (A+B=B+A) and associative ((A+B)+C=A+(B+C)). The zero matrix, with all entries 0, acts as the additive identity.

Scalar Multiplication

For a scalar cR and a matrix A=[[aij], we define

cA=[caij].

This stretches or shrinks all entries of the matrix uniformly.

Example 2.2.2.
If

A=[2103],c=2,

then

cA=[4206].

Matrix Multiplication

The defining operation of matrices is multiplication. If

ARm×n,BRn×p,

then their product is the m×p matrix

AB=C=[cij],cij=k=1naikbkj.

Thus, the entry in the i-th row and j-th column of AB is the dot product of the i-th row of A with the j-th column of B.

Example 2.2.3.
Let

A=[1203],B=[4125].

Then

AB=[14+221(1)+2504+320(1)+35]=[89615].

Notice that matrix multiplication is not commutative in general: ABBA. Sometimes BA may not even be defined if dimensions do not align.

Geometric Meaning

Matrix multiplication corresponds to the composition of linear transformations. If A transforms vectors in Rn and B transforms vectors in Rp, then AB represents applying B first, then A. This makes matrices the algebraic language of transformations.

Notation

Why this matters

Matrix multiplication is the core mechanism of linear algebra: it encodes how transformations combine, how systems of equations are solved, and how data flows in modern algorithms. Addition and scalar multiplication make matrices into a vector space, while multiplication gives them an algebraic structure rich enough to model geometry, computation, and networks.

Exercises 2.2

  1. Compute A+B for
A=[2310],B=[4257].
  1. Find 3A where
A=[1426].
  1. Multiply
A=[102131],B=[210134].
  1. Verify with an explicit example that ABBA.
  2. Prove that matrix multiplication is distributive: A(B+C)=AB+AC.

2.3 Transpose and Inverse

Two special operations on matrices-the transpose and the inverse-give rise to deep algebraic and geometric properties. The transpose rearranges a matrix by flipping it across its main diagonal, while the inverse, when it exists, acts as the undo operation for matrix multiplication.

The Transpose

The transpose of an m×n matrix A=[aij] is the n×m matrix AT=[aji], obtained by swapping rows and columns.

Formally,

(AT)ij=aji.

Example 2.3.1.
If

A=[142035],

then

AT=[104325].

Properties of the Transpose.

  1. (AT)T=A.
  2. (A+B)T=AT+BT.
  3. (cA)T=cAT, for scalar c.
  4. (AB)T=BTAT.

The last rule is crucial: the order reverses.

The Inverse

A square matrix ARn×n is said to be invertible (or nonsingular) if there exists another matrix A1 such that

AA1=A1A=In,

where In is the n×n identity matrix. In this case, A1 is called the inverse of A.

Not every matrix is invertible. A necessary condition is that det(A)0, a fact that will be developed in Chapter

Example 2.3.2.
Let

A=[1234].

Its determinant is det(A)=(1)(4)(2)(3)=20. The inverse is

A1=1det(A)[4231]=[211.50.5].

Verification:

AA1=[1234][211.50.5]=[1001].

Geometric Meaning

Notation

Why this matters

The transpose allows us to define symmetric and orthogonal matrices, central to geometry and numerical methods. The inverse underlies the solution of linear systems, encoding the idea of undoing a transformation. Together, these operations set the stage for determinants, eigenvalues, and orthogonalization.

Exercises 2.3

  1. Compute the transpose of
A=[213045].
  1. Verify that (AB)T=BTAT for
A=[1201],B=[3456].
  1. Determine whether
C=[2142]

is invertible. If so, find C1.

  1. Find the inverse of
D=[0110],

and explain its geometric action on vectors in the plane.

  1. Prove that if A is invertible, then so is AT, and (AT)1=(A1)T.

2.4 Special Matrices

Certain matrices occur so frequently in theory and applications that they are given special names. Recognizing their properties allows us to simplify computations and understand the structure of linear transformations more clearly.

The Identity Matrix

The identity matrix In is the n×n matrix with ones on the diagonal and zeros elsewhere:

In=[100010001].

It acts as the multiplicative identity:

AIn=InA=A,for all ARn×n.

Geometrically, In represents the transformation that leaves every vector unchanged.

Diagonal Matrices

A diagonal matrix has all off-diagonal entries zero:

D=[d11000d22000dnn].

Multiplication by a diagonal matrix scales each coordinate independently:

Dx=(d11x1,d22x2,,dnnxn).

Example 2.4.1.
Let

D=[200030001],x=[142].

Then

Dx=[2122].

Permutation Matrices

A permutation matrix is obtained by permuting the rows of the identity matrix. Multiplying a vector by a permutation matrix reorders its coordinates.

Example 2.4.2.
Let

P=[010100001].

Then

P[abc]=[bac].

Thus, P swaps the first two coordinates.

Permutation matrices are always invertible; their inverses are simply their transposes.

Symmetric and Skew-Symmetric Matrices

A matrix is symmetric if

AT=A,

and skew-symmetric if

AT=A.

Symmetric matrices appear in quadratic forms and optimization, while skew-symmetric matrices describe rotations and cross products in geometry.

Orthogonal Matrices

A square matrix Q is orthogonal if

QTQ=QQT=I.

Equivalently, the rows (and columns) of Q form an orthonormal set. Orthogonal matrices preserve lengths and angles; they represent rotations and reflections.

Example 2.4.3.
The rotation matrix in the plane:

R(θ)=[cosθsinθsinθcosθ]

is orthogonal, since

R(θ)TR(θ)=I2.

Why this matters

Special matrices serve as the building blocks of linear algebra. Identity matrices define the neutral element, diagonal matrices simplify computations, permutation matrices reorder data, symmetric and orthogonal matrices describe fundamental geometric structures. Much of modern applied mathematics reduces complex problems to operations involving these simple forms.

Exercises 2.4

  1. Show that the product of two diagonal matrices is diagonal, and compute an example.
  2. Find the permutation matrix that cycles (a,b,c) into (b,c,a).
  3. Prove that every permutation matrix is invertible and its inverse is its transpose.
  4. Verify that
Q=[0110]

is orthogonal. What geometric transformation does it represent?
5. Determine whether

A=[2332],B=[0550]

are symmetric, skew-symmetric, or neither.

Chapter 3. Systems of Linear Equations

3.1 Linear Systems and Solutions

One of the central motivations for linear algebra is solving systems of linear equations. These systems arise naturally in science, engineering, and data analysis whenever multiple constraints interact. Matrices provide a compact language for expressing and solving them.

Linear Systems

A linear system consists of equations where each unknown appears only to the first power and with no products between variables. A general system of m equations in n unknowns can be written as:

a11x1+a12x2++a1nxn=b1,a21x1+a22x2++a2nxn=b2,am1x1+am2x2++amnxn=bm.

Here the coefficients aij and constants bi are scalars, and the unknowns are x1,x2,,xn.

Matrix Form

The system can be expressed compactly as:

Ax=b,

where

This formulation turns the problem of solving equations into analyzing the action of a matrix.

Example 3.1.1.
The system

{x+2y=5,3xy=4

can be written as

[1231][xy]=[54].

Types of Solutions

A linear system may have:

  1. No solution (inconsistent): The equations conflict.
    Example:

    {x+y=1x+y=2

    has no solution.

  2. Exactly one solution (unique): The system’s equations intersect at a single point.
    Example: The above system with coefficient matrix $
    \begin{bmatrix} 1 & 2 \ 3 & -1 \end{bmatrix}
    $ has a unique solution.

  3. Infinitely many solutions: The equations describe overlapping constraints (e.g., multiple equations representing the
    same line or plane).

The nature of the solution depends on the rank of A and its relation to the augmented matrix (A|b), which
we will study later.

Geometric Interpretation

Why this matters

Linear systems are the practical foundation of linear algebra. They appear in balancing chemical reactions, circuit analysis, least-squares regression, optimization, and computer graphics. Understanding how to represent and classify their solutions is the first step toward systematic solution methods like Gaussian elimination.

Exercises 3.1

  1. Write the following system in matrix form:
{2x+3yz=7,xy+4z=1,3x+2y+z=5
  1. Determine whether the system
{x+y=1,2x+2y=2

has no solution, one solution, or infinitely many solutions.

  1. Geometrically interpret the system
{x+y=3,xy=1

in the plane.

  1. Solve the system
{2x+y=1,xy=4

and check your solution.

  1. In R3, describe the solution set of
{x+y+z=0,2x+2y+2z=0

.
What geometric object does it represent?

3.2 Gaussian Elimination

To solve linear systems efficiently, we use Gaussian elimination: a systematic method of transforming a system into a simpler equivalent one whose solutions are easier to see. The method relies on elementary row operations that preserve the solution set.

Elementary Row Operations

On an augmented matrix (A|b), we are allowed three operations:

  1. Row swapping: interchange two rows.
  2. Row scaling: multiply a row by a nonzero scalar.
  3. Row replacement: replace one row by itself plus a multiple of another row.

These operations correspond to re-expressing equations in different but equivalent forms.

Row Echelon Form

A matrix is in row echelon form (REF) if:

  1. All nonzero rows are above any zero rows.
  2. Each leading entry (the first nonzero number from the left in a row) is to the right of the leading entry in the row above.
  3. All entries below a leading entry are zero.

Further, if each leading entry is 1 and is the only nonzero entry in its column, the matrix is in reduced row echelon
form (RREF).

Algorithm of Gaussian Elimination

  1. Write the augmented matrix for the system.
  2. Use row operations to create zeros below each pivot (the leading entry in a row).
  3. Continue column by column until the matrix is in echelon form.
  4. Solve by back substitution: starting from the last pivot equation and working upward.

If we continue to RREF, the solution can be read off directly.

Example

Example 3.2.1. Solve

{x+2yz=3,2x+y+z=7,3xy+2z=4.

Step 1. Augmented matrix

[121321173124].

Step 2. Eliminate below the first pivot

Subtract 2 times row 1 from row 2, and 3 times row 1 from row 3:

[121303310755].

Step 3. Pivot in column 2

Divide row 2 by -3:

[1213011130755].

Add 7 times row 2 to row 3:

[121301113002223].

Step 4. Pivot in column 3

Divide row 3 by -2:

[121301113001113].

Step 5. Back substitution

From the last row:

z=113.

Second row:

yz=13y=13+113=103.

First row:

x+2yz=3x+2103113=3.

So

x+203113=3x+3=3x=0.

Solution:

(x,y,z)=(0,103,113).

Why this matters

Gaussian elimination is the foundation of computational linear algebra. It reduces complex systems to a form where
solutions are visible, and it forms the basis for algorithms used in numerical analysis, scientific computing, and
machine learning.

Exercises 3.2

  1. Solve by Gaussian elimination:
{x+y=2,2xy=0.
  1. Reduce the following augmented matrix to REF:
[1116213141422].
  1. Show that Gaussian elimination always produces either:

    • a unique solution,
    • infinitely many solutions, or
    • a contradiction (no solution).
  2. Use Gaussian elimination to find all solutions of

{x+y+z=0,2x+y+z=1.
  1. Explain why pivoting (choosing the largest available pivot element) is useful in numerical computation.

3.3 Rank and Consistency

Gaussian elimination not only provides solutions but also reveals the structure of a linear system. Two key ideas are the rank of a matrix and the consistency of a system. Rank measures the amount of independent information in the equations, while consistency determines whether the system has at least one solution.

Rank of a Matrix

The rank of a matrix is the number of leading pivots in its row echelon form. Equivalently, it is the maximum number of linearly independent rows or columns.

Formally,

rank(A)=dim(row space of A)=dim(column space of A).

The rank tells us the effective dimension of the space spanned by the rows (or columns).

Example 3.3.1.
For

A=[123246369],

row reduction gives

[123000000].

Thus, rank(A)=1, since all rows are multiples of the first.

Consistency of Linear Systems

Consider the system Ax=b.
The system is consistent (has at least one solution) if and only if

rank(A)=rank(A|b),

where (A|b) is the augmented matrix.
If the ranks differ, the system is inconsistent.

Example

Example 3.3.2.
Consider

{x+y+z=1,2x+2y+2z=2,x+y+z=3.

The augmented matrix is

[111122221113].

Row reduction gives

[111100000002].

Here, rank(A)=1, but rank(A|b)=2. Since the ranks differ, the system is inconsistent: no solution exists.

Example with Infinite Solutions

Example 3.3.3.
For

{x+y=2,2x+2y=4,

the augmented matrix reduces to

[112000].

Here, rank(A)=rank(A|b)=1<2. Thus, infinitely many solutions exist, forming a line.

Why this matters

Rank is a measure of independence: it tells us how many truly distinct equations or directions are present. Consistency explains when equations align versus when they contradict. These concepts connect linear systems to vector spaces and prepare for the ideas of dimension, basis, and the Rank–Nullity Theorem.

Exercises 3.3

  1. Compute the rank of
A=[121011251].
  1. Determine whether the system
{x+y+z=1,2x+3y+z=2,3x+5y+2z=3

is consistent.

  1. Show that the rank of the identity matrix In is n.

  2. Give an example of a system in R3 with infinitely many solutions, and explain why it satisfies the rank
    condition.

  3. Prove that for any matrix ARm×n,
    $
    \text{rank}(A) \leq \min(m,n).
    $

3.4 Homogeneous Systems

A homogeneous system is a linear system in which all constant terms are zero:

Ax=0,

where ARm×n, and 0 is the zero vector in Rm.

The Trivial Solution

Every homogeneous system has at least one solution:

x=0.

This is called the trivial solution. The interesting question is whether nontrivial solutions (nonzero vectors) exist.

Existence of Nontrivial Solutions

Nontrivial solutions exist precisely when the number of unknowns exceeds the rank of the coefficient matrix:

rank(A)<n.

In this case, there are infinitely many solutions, forming a subspace of Rn. The dimension of this solution space is

dim(null(A))=nrank(A),

where null(A) is the set of all solutions to Ax=0. This set is called the null space or kernel of A.

Example

Example 3.4.1.
Consider

{x+y+z=0,2x+yz=0.

The augmented matrix is

[11102110].

Row reduction:

[11100130][11100130].

So the system is equivalent to:

{x+y+z=0,y+3z=0.

From the second equation, y=3z. Substituting into the first:

x3z+z=0x=2z.

Thus solutions are:

(x,y,z)=z(2,3,1),zR.

The null space is the line spanned by the vector (2,3,1).

Geometric Interpretation

The solution set of a homogeneous system is always a subspace of Rn.

More generally, the null space has dimension nrank(A), known as the nullity.

Why this matters

Homogeneous systems are central to understanding vector spaces, subspaces, and dimension. They lead directly to the concepts of kernel, null space, and linear dependence. In applications, homogeneous systems appear in equilibrium problems, eigenvalue equations, and computer graphics transformations.

Exercises 3.4

  1. Solve the homogeneous system
{x+2yz=0,2x+4y2z=0.

What is the dimension of its solution space?

  1. Find all solutions of
{xy+z=0,2x+yz=0.
  1. Show that the solution set of any homogeneous system is a subspace of Rn.
  2. Suppose A is a 3×3 matrix with rank(A)=2. What is the dimension of the null space of A?
  3. For
A=[121013],

compute a basis for the null space of A.

Chapter 4. Vector Spaces

4.1 Definition of a Vector Space

Up to now we have studied vectors and matrices concretely in Rn. The next step is to move beyond coordinates and define vector spaces in full generality. A vector space is an abstract setting where the familiar rules of addition and scalar multiplication hold, regardless of whether the elements are geometric vectors, polynomials, functions, or other objects.

Formal Definition

A vector space over the real numbers R is a set V equipped with two operations:

  1. Vector addition: For any u,vV, there is a vector u+vV.
  2. Scalar multiplication: For any scalar cR and any vV, there is a
    vector cvV.

These operations must satisfy the following axioms (for all u,v,wV and all
scalars a,bR):

  1. Commutativity of addition: u+v=v+u.
  2. Associativity of addition: (u+v)+w=u+(v+w).
  3. Additive identity: There exists a zero vector 0V such that v+0=v.
  4. Additive inverses: For each vV, there exists (vV such
    that v+(v)=0.
  5. Compatibility of scalar multiplication: a(bv)=(ab)v.
  6. Identity element of scalars: 1v=v.
  7. Distributivity over vector addition: a(u+v)=au+av.
  8. Distributivity over scalar addition: (a+b)v=av+bv.

If a set V with operations satisfies all eight axioms, we call it a vector space.

Examples

Example 4.1.1. Standard Euclidean space
Rn with ordinary addition and scalar multiplication is a vector space. This is the model case from which the axioms are abstracted.

Example 4.1.2. Polynomials
The set of all polynomials with real coefficients, denoted R[x], forms a vector space. Addition and scalar multiplication are defined term by term.

Example 4.1.3. Functions
The set of all real-valued functions on an interval, e.g. f:[0,1]R, forms a vector space, since functions can be added and scaled pointwise.

Non-Examples

Not every set with operations qualifies. For instance, the set of positive real numbers under usual addition is not a vector space, because additive inverses (negative numbers) are missing. The axioms must all hold.

Geometric Interpretation

In familiar cases like R2 or R3, vector spaces provide the stage for geometry: vectors can be added, scaled, and combined to form lines, planes, and higher-dimensional structures. In abstract settings like function spaces, the same algebraic rules let us apply geometric intuition to infinite-dimensional problems.

Why this matters

The concept of vector space unifies seemingly different mathematical objects under a single framework. Whether dealing with forces in physics, signals in engineering, or data in machine learning, the common language of vector spaces allows us to use the same techniques everywhere.

Exercises 4.1

  1. Verify that R2 with standard addition and scalar multiplication satisfies all eight vector space axioms.
  2. Show that the set of integers Z with ordinary operations is not a vector space over R. Which axiom fails?
  3. Consider the set of all polynomials of degree at most 3. Show it forms a vector space over R. What is its dimension?
  4. Give an example of a vector space where the vectors are not geometric objects.
  5. Prove that in any vector space, the zero vector is unique.

4.2 Subspaces

A subspace is a smaller vector space living inside a larger one. Just as lines and planes naturally sit inside three-dimensional space, subspaces generalize these ideas to higher dimensions and more abstract settings.

Definition

Let V be a vector space. A subset WV is called a subspace of V if:

  1. 0W (contains the zero vector),
  2. For all u,vW, the sum u+vW (closed under addition),
  3. For all scalars cR and vectors vW, the product cvW (closed under
    scalar multiplication).

If these hold, then W is itself a vector space with the inherited operations.

Examples

Example 4.2.1. Line through the origin in R2
The set

W={(t,2t)tR}

is a subspace of R2. It contains the zero vector, is closed under addition, and is closed under scalar multiplication.

Example 4.2.2. The x–y plane in R3
The set

W={(x,y,0)x,yR}

is a subspace of R3. It is the collection of all vectors lying in the plane through the origin parallel to the x–y plane.

Example 4.2.3. Null space of a matrix
For a matrix ARm×n, the null space

{xRnAx=0}

is a subspace of Rn. This subspace represents all solutions to the homogeneous system.

Non-Examples

Not every subset is a subspace.

Geometric Interpretation

Subspaces are the linear structures inside vector spaces.

Why this matters

Subspaces capture the essential structure of linear problems. Column spaces, row spaces, and null spaces are all subspaces. Much of linear algebra consists of understanding how these subspaces intersect, span, and complement each other.

Exercises 4.2

  1. Prove that the set W=(x,0)xRR2 is a subspace.
  2. Show that the line (1+t,2t)tR is not a subspace of R2. Which condition fails?
  3. Determine whether the set of all vectors (x,y,z)R3 satisfying x+y+z=0 is a subspace.
  4. For the matrix
A=[123456],

describe the null space of A as a subspace of R3.
5. List all possible subspaces of R2.

4.3 Span, Basis, Dimension

The ideas of span, basis, and dimension provide the language for describing the size and structure of subspaces. Together, they tell us how a vector space is generated, how many building blocks it requires, and how those blocks can be chosen.

Span

Given a set of vectors v1,v2,,vkV, the span is the collection of all linear combinations:

span{v1,,vk}={c1v1++ckvkciR}.

The span is always a subspace of V, namely the smallest subspace containing those vectors.

Example 4.3.1.
In R2, $ \text{span}{(1,0)} = {(x,0) \mid x \in \mathbb{R}},$ the x-axis.
Similarly, span{(1,0),(0,1)}=R2.

Basis

A basis of a vector space V is a set of vectors that:

  1. Span V.
  2. Are linearly independent (no vector in the set is a linear combination of the others).

If either condition fails, the set is not a basis.

Example 4.3.2.
In R3, the standard unit vectors

e1=(1,0,0),e2=(0,1,0),e3=(0,0,1)

form a basis. Every vector (x,y,z) can be uniquely written as

xe1+ye2+ze3.

Dimension

The dimension of a vector space V, written dim(V), is the number of vectors in any basis of V. This number is
well-defined: all bases of a vector space have the same cardinality.

Examples 4.3.3.

Geometric Interpretation

Lines, planes, and higher-dimensional flats can all be described in terms of span, basis, and dimension.

Why this matters

These concepts classify vector spaces and subspaces in terms of size and structure. Many theorems in linear algebra-such as the Rank–Nullity Theorem-are consequences of understanding span, basis, and dimension. In practical terms, bases are how we encode data in coordinates, and dimension tells us how much freedom a system truly has.

Exercises 4.3

  1. Show that (1,0,0), (0,1,0), (1,1,0) span the xy-plane in R3. Are they a basis?
  2. Find a basis for the line {(2t,3t,t):tR} in R3.
  3. Determine the dimension of the subspace of R3 defined by x+y+z=0.
  4. Prove that any two different bases of Rn must contain exactly n vectors.
  5. Give a basis for the set of polynomials of degree 2. What is its dimension?

4.4 Coordinates

Once a basis for a vector space is chosen, every vector can be expressed uniquely as a linear combination of the basis vectors. The coefficients in this combination are called the coordinates of the vector relative to that basis. Coordinates allow us to move between the abstract world of vector spaces and the concrete world of numbers.

Coordinates Relative to a Basis

Let V be a vector space, and let

B={v1,v2,,vn}

be an ordered basis for V. Every vector uV can be written uniquely as

u=c1v1+c2v2++cnvn.

The scalars (c1,c2,,cn) are the coordinates of u relative to B, written

[u]B=[c1c2cn].

Example in R2

Example 4.4.1.
Let the basis be

B={(1,1),(1,1)}.

To find the coordinates of u=(3,1) relative to B, solve

(3,1)=c1(1,1)+c2(1,1).

This gives the system

{c1+c2=3,c1c2=1.

Adding: 2c1=4c1=2. Then c2=1.

So,

[u]B=[21].

Standard Coordinates

In Rn, the standard basis is

e1=(1,0,,0),e2=(0,1,0,,0),,en=(0,,0,1).

Relative to this basis, the coordinates of a vector are simply its entries. Thus, column vectors are coordinate representations by default.

Change of Basis

If B=v1,,vn is a basis of Rn, the change of basis matrix is

P=[v1v2vn],

with basis vectors as columns. For any vector u,

u=P[u]B,[u]B=P1u.

Thus, switching between bases reduces to matrix multiplication.

Geometric Interpretation

Coordinates are the address of a vector relative to a chosen set of directions. Different bases are like different coordinate systems: Cartesian, rotated, skewed, or scaled. The same vector may look very different numerically depending on the basis, but its geometric identity is unchanged.

Why this matters

Coordinates turn abstract vectors into concrete numerical data. Changing basis is the algebraic language for rotations of axes, diagonalization of matrices, and principal component analysis in data science. Mastery of coordinates is essential for moving fluidly between geometry, algebra, and computation.

Exercises 4.4

  1. Express (4,2) in terms of the basis (1,1),(1,1).
  2. Find the coordinates of (1,2,3) relative to the standard basis of R3.
  3. If B={(2,0),(0,3)}, compute [(4,6)]B.
  4. Construct the change of basis matrix from the standard basis of R2 to B={(1,1),(1,1)}.
  5. Prove that coordinate representation with respect to a basis is unique.

Chapter 5. Linear Transformations

5.1 Functions that Preserve Linearity

A central theme of linear algebra is understanding linear transformations: functions between vector spaces that preserve their algebraic structure. These transformations generalize the idea of matrix multiplication and capture the essence of linear behavior.

Definition

Let V and W be vector spaces over R. A function

T:VW

is called a linear transformation (or linear map) if for all vectors u,vV and all
scalars cR:

  1. Additivity:
T(u+v)=T(u)+T(v),
  1. Homogeneity:
T(cu)=cT(u).

If both conditions hold, then T automatically respects linear combinations:

T(c1v1++ckvk)=c1T(v1)++ckT(vk).

Examples

Example 5.1.1. Scaling in R2.
Let T:R2R2 be defined by

T(x,y)=(2x,2y).

This doubles the length of every vector, preserving direction. It is linear.

Example 5.1.2. Rotation.
Let Rθ:R2R2 be

Rθ(x,y)=(xcosθysinθ,xsinθ+ycosθ).

This rotates vectors by angle θ. It satisfies additivity and homogeneity, hence is linear.

Example 5.1.3. Differentiation.
Let D:R[x]R[x] be differentiation: D(p(x))=p(x). Since derivatives respect addition and scalar multiples, differentiation is a linear transformation.

Non-Example

The map S:R2R2 defined by

S(x,y)=(x2,y2)

is not linear, because S(u+v)S(u)+S(v) in general.

Geometric Interpretation

Linear transformations are exactly those that preserve the origin, lines through the origin, and proportions along those lines. They include familiar operations: scaling, rotations, reflections, shears, and projections. Nonlinear transformations bend or curve space, breaking these properties.

Why this matters

Linear transformations unify geometry, algebra, and computation. They explain how matrices act on vectors, how data can be rotated or projected, and how systems evolve under linear rules. Much of linear algebra is devoted to understanding these transformations, their representations, and their invariants.

Exercises 5.1

  1. Verify that T(x,y)=(3xy,2y) is a linear transformation on R2.
  2. Show that T(x,y)=(x+1,y) is not linear. Which axiom fails?
  3. Prove that if T and S are linear transformations, then so is T+S.
  4. Give an example of a linear transformation from R3 to R2.
  5. Let T:R[x]R[x] be integration:
T(p(x))=0xp(t)dt.

Prove that T is a linear transformation.

5.2 Matrix Representation of Linear Maps

Every linear transformation between finite-dimensional vector spaces can be represented by a matrix. This correspondence is one of the central insights of linear algebra: it lets us use the tools of matrix arithmetic to study abstract
transformations.

From Linear Map to Matrix

Let T:RnRm be a linear transformation. Choose the standard basis {e1,,en} of Rn, where ei has a 1 in the i-th position and 0 elsewhere.

The action of T on each basis vector determines the entire transformation:

T(ej)=[a1ja2jamj].

Placing these outputs as columns gives the matrix of T:

[T]=A=[a11a12a1na21a22a2nam1am2amn].

Then for any vector xRn:

T(x)=Ax.

Examples

Example 5.2.1. Scaling in R2.
Let T(x,y)=(2x,3y). Then

T(e1)=(2,0),T(e2)=(0,3).

So the matrix is

[T]=[2003].

Example 5.2.2. Rotation in the plane.
The rotation transformation Rθ(x,y)=(xcosθysinθ,xsinθ+ycosθ) has matrix

[Rθ]=[cosθsinθsinθcosθ].

Example 5.2.3. Projection onto the x-axis.
The map P(x,y)=(x,0) corresponds to

[P]=[1000].

Change of Basis

Matrix representations depend on the chosen basis. If B and C are bases of Rn and Rm, then the matrix of T:RnRm with respect to these bases is obtained by expressing T(vj) in terms of C for each vjB. Changing bases corresponds to conjugating the matrix by the appropriate change-of-basis matrices.

Geometric Interpretation

Matrices are not just convenient notation-they are linear maps once a basis is fixed. Every rotation, reflection, projection, shear, or scaling corresponds to multiplying by a specific matrix. Thus, studying linear transformations reduces to studying their matrices.

Why this matters

Matrix representations make linear transformations computable. They connect abstract definitions to explicit calculations, enabling algorithms for solving systems, finding eigenvalues, and performing decompositions. Applications from graphics to machine learning depend on this translation.

Exercises 5.2

  1. Find the matrix representation of T:R2R2, T(x,y)=(x+y,xy).
  2. Determine the matrix of the linear transformation T:R3R2, T(x,y,z)=(x+z,y2z).
  3. What matrix represents reflection across the line y=x in R2?
  4. Show that the matrix of the identity transformation on Rn is In.
  5. For the differentiation map D:R2[x]R1[x], where Rk[x] is the space of polynomials of degree at most k, find the matrix of D relative to the bases {1,x,x2} and {1,x}.

5.3 Kernel and Image

To understand a linear transformation deeply, we must examine what it kills and what it produces. These ideas are captured by the kernel and the image, two fundamental subspaces associated with any linear map.

The Kernel

The kernel (or null space) of a linear transformation T:VW is the set of all vectors in V that map to the zero vector in W:

ker(T)={vVT(v)=0}.

The kernel is always a subspace of V. It measures the degeneracy of the transformation-directions that collapse to nothing.

Example 5.3.1.
Let T:R3R2 be defined by

T(x,y,z)=(x+y,y+z).

In matrix form,

[T]=[110011].

To find the kernel, solve

[110011][xyz]=[00].

This gives the equations x+y=0, y+z=0. Hence x=y,z=y. The kernel is

ker(T)={(t,t,t)tR},

a line in R3.

The Image

The image (or range) of a linear transformation T:VW is the set of all outputs:

im(T)={T(v)vV}W.

Equivalently, it is the span of the columns of the representing matrix. The image is always a subspace of W.

Example 5.3.2.
For the same transformation as above,

[T]=[110011],

the columns are (1,0), (1,1), and (0,1). Since (1,1)=(1,0)+(0,1), the image is

im(T)=span{(1,0),(0,1)}=R2.

Dimension Formula (Rank–Nullity Theorem)

For a linear transformation T:VW with V finite-dimensional,

dim(ker(T))+dim(im(T))=dim(V).

This fundamental result connects the lost directions (kernel) with the achieved directions (image).

Geometric Interpretation

Why this matters

Kernel and image capture the essence of a linear map. They classify transformations, explain when systems have unique or infinite solutions, and form the backbone of important results like the Rank–Nullity Theorem, diagonalization, and spectral theory.

Exercises 5.3

  1. Find the kernel and image of T:R2R2, T(x,y)=(xy,x+y).
  2. Let A=[123014]. Find bases for ker(A) and im(A).
  3. For the projection map P(x,y,z)=(x,y,0), describe the kernel and image.
  4. Prove that ker(T) and im(T) are always subspaces.
  5. Verify the Rank–Nullity Theorem for the transformation in Example 5.3.1.

5.4 Change of Basis

Linear transformations can look very different depending on the coordinate system we use. The process of rewriting vectors and transformations relative to a new basis is called a change of basis. This concept lies at the heart of diagonalization, orthogonalization, and many computational techniques.

Coordinate Change

Suppose V is an n-dimensional vector space, and let B={v1,,vn} be a basis. Every vector xV has a coordinate vector [x]BRn.

If P is the change-of-basis matrix from B to the standard basis, then

x=P[x]B.

Equivalently,

[x]B=P1x.

Here, P has the basis vectors of B as its columns:

P=[v1v2vn].

Transformation of Matrices

Let T:VV be a linear transformation. Suppose its matrix in the standard basis is A. In the basis B, the representing matrix becomes

[T]B=P1AP.

Thus, changing basis corresponds to a similarity transformation of the matrix.

Example

Example 5.4.1.
Let T:R2R2 be given by

T(x,y)=(3x+y,x+y).

In the standard basis, its matrix is

A=[3111].

Now consider the basis B={(1,1),(1,1)}. The change-of-basis matrix is

P=[1111].

Then

[T]B=P1AP.

Computing gives

[T]B=[4000].

In this new basis, the transformation is diagonal: one direction is scaled by 4, the other collapsed to 0.

Geometric Interpretation

Change of basis is like rotating or skewing your coordinate grid. The underlying transformation does not change, but its description in numbers becomes simpler or more complicated depending on the basis. Finding a basis that simplifies a transformation (often a diagonal basis) is a key theme in linear algebra.

Why this matters

Change of basis connects the abstract notion of similarity to practical computation. It is the tool that allows us to diagonalize matrices, compute eigenvalues, and simplify complex transformations. In applications, it corresponds to choosing a more natural coordinate system-whether in geometry, physics, or machine learning.

Exercises 5.4

  1. Let A=[2102]. Compute its representation in the basis {(1,0),(1,1)}.
  2. Find the change-of-basis matrix from the standard basis of R2 to {(2,1),(1,1)}.
  3. Prove that similar matrices (related by P1AP) represent the same linear transformation under different bases.
  4. Diagonalize the matrix A=[1001] in the basis {(1,1),(1,1)}.
  5. In R3, let B={(1,0,0),(1,1,0),(1,1,1)}. Construct the change-of-basis matrix P and compute P1.

Chapter 6. Determinants

6.1 Motivation and Geometric Meaning

Determinants are numerical values associated with square matrices. At first they may appear as a complicated formula, but their importance comes from what they measure: determinants encode scaling, orientation, and invertibility of linear transformations. They bridge algebra and geometry.

Determinants of 2×2 Matrices

For a 2×2 matrix

A=[abcd],

the determinant is defined as

det(A)=adbc.

Geometric meaning: If A represents a linear transformation of the plane, then |det(A)| is the area scaling factor. For example, if det(A)=2, areas of shapes are doubled. If det(A)=0, the transformation collapses the plane to
a line: all area is lost.

Determinants of 3×3 Matrices

For

A=[abcdefghi],

the determinant can be computed as

det(A)=a(eifh)b(difg)+c(dheg).

Geometric meaning: In R3, |det(A)| is the volume scaling factor. If det(A)<0, orientation is reversed (a handedness flip), such as turning a right-handed coordinate system into a left-handed one.

General Case

For ARn×n, the determinant is a scalar that measures how the linear transformation given by A scales n-dimensional volume.

Visual Examples

  1. Shear in R2:
    A=[1101].
    Then det(A)=1. The transformation slants the unit square into a parallelogram but preserves area.

  2. Projection in R2:
    A=[1000].
    Then det(A)=0. The unit square collapses into a line segment: area vanishes.

  3. Rotation in R2:
    Rθ=[cosθsinθsinθcosθ].
    Then det(Rθ)=1. Rotations preserve area and orientation.

Why this matters

The determinant is not just a formula-it is a measure of transformation. It tells us whether a matrix is invertible, how it distorts space, and whether it flips orientation. This geometric insight makes the determinant indispensable in analysis, geometry, and applied mathematics.

Exercises 6.1

  1. Compute the determinant of [2314]. What area scaling factor does it represent?
  2. Find the determinant of the shear matrix [1201]. What happens to the area of the unit square?
  3. For the 3×3 matrix [100020003], compute the determinant. How does it scale volume in R3?
  4. Show that any rotation matrix in R2 has determinant 1.
  5. Give an example of a 2×2 matrix with determinant 1. What geometric action does it represent?

6.2 Properties of Determinants

Beyond their geometric meaning, determinants satisfy a collection of algebraic rules that make them powerful tools in linear algebra. These properties allow us to compute efficiently, test invertibility, and understand how determinants behave under matrix operations.

Basic Properties

Let A,BRn×n, and let cR. Then:

  1. Identity:
det(In)=1.
  1. Triangular matrices:
    If A is upper or lower triangular, then
det(A)=a11a22ann.
  1. Row/column swap:
    Interchanging two rows (or columns) multiplies the determinant by 1.

  2. Row/column scaling:
    Multiplying a row (or column) by a scalar c multiplies the determinant by c.

  3. Row/column addition:
    Adding a multiple of one row to another does not change the determinant.

  4. Transpose:

det(AT)=det(A).
  1. Multiplicativity:
det(AB)=det(A)det(B).
  1. Invertibility:
    A is invertible if and only if det(A)0.

Example Computations

Example 6.2.1.
For

A=[200130145],

A is lower triangular, so

det(A)=235=30.

Example 6.2.2.
Let

B=[1234],C=[0110].

Then

det(B)=1423=2,det(C)=1.

Since CB is obtained by swapping rows of B,

det(CB)=det(B)=2.

This matches the multiplicativity rule: det(CB)=det(C)det(B)=(1)(2)=2.

Geometric Insights

These properties make determinants both computationally manageable and geometrically interpretable.

Why this matters

Determinant properties connect computation with geometry and theory. They explain why Gaussian elimination works, why invertibility is equivalent to nonzero determinant, and why determinants naturally arise in areas like volume computation, eigenvalue theory, and differential equations.

Exercises 6.2

  1. Compute the determinant of
A=[123014002].
  1. Show that if two rows of a square matrix are identical, then its determinant is zero.

  2. Verify det(AT)=det(A) for

A=[2134].
  1. If A is invertible, prove that
det(A1)=1det(A).
  1. Suppose A is a 3×3 matrix with det(A)=5. What is det(2A)?

6.3 Cofactor Expansion

While determinants of small matrices can be computed directly from formulas, larger matrices require a systematic method. The cofactor expansion (also known as Laplace expansion) provides a recursive way to compute determinants by breaking them into smaller ones.

Minors and Cofactors

For an n×n matrix A=[aij]:

Cij=(1)i+jMij.

The sign factor (1)i+j alternates in a checkerboard pattern:

[++++].

Cofactor Expansion Formula

The determinant of A can be computed by expanding along any row or any column:

det(A)=j=1naijCij(expansion along row i),det(A)=i=1naijCij(expansion along column j).

Example

Example 6.3.1.
Compute

A=[123045106].

Expand along the first row:

det(A)=1C11+2C12+3C13.

Thus,

det(A)=1(24)+2(5)+3(4)=24+1012=22.

Properties of Cofactor Expansion

  1. Expansion along any row or column yields the same result.
  2. The cofactor expansion provides a recursive definition of determinant: a determinant of size n is expressed in terms of determinants of size n1.
  3. Cofactors are fundamental in constructing the adjugate matrix, which gives a formula for inverses:
A1=1det(A)adj(A),where adj(A)=[Cji].

Geometric Interpretation

Cofactor expansion breaks down the determinant into contributions from sub-volumes defined by fixing one row or column at a time. Each cofactor measures how that row/column influences the overall volume scaling.

Why this matters

Cofactor expansion generalizes the small-matrix formulas and provides a conceptual definition of determinants. While not the most efficient way to compute determinants for large matrices, it is essential for theory, proofs, and connections to adjugates, Cramer’s rule, and classical geometry.

Exercises 6.3

  1. Compute the determinant of
[201314120]

by cofactor expansion along the first column.

  1. Verify that expanding along the second row of Example 6.3.1 gives the same determinant.
  2. Prove that expansion along any row gives the same value.
  3. Show that if a row of a matrix is zero, then its determinant is zero.
  4. Use cofactor expansion to prove that det(A)=det(AT).

6.4 Applications (Volume, Invertibility Test)

Determinants are not merely algebraic curiosities; they have concrete geometric and computational uses. Two of the most important applications are measuring volumes and testing invertibility of matrices.

Determinants as Volume Scalers

Given vectors v1,v2,,vnRn, arrange them as columns of a matrix:

A=[|||v1v2vn|||].

Then |det(A)| equals the volume of the parallelepiped spanned by these vectors.

Example 6.4.1.
Let

v1=(1,0,0),v2=(1,1,0),v3=(1,1,1).

Then

A=[111011001],det(A)=1.

So the parallelepiped has volume 1, even though the vectors are not orthogonal.

Invertibility Test

A square matrix A is invertible if and only if det(A)0.

Example 6.4.2.
The matrix

B=[2412]

has determinant det(B)=2241=0.
Thus, B is not invertible. Geometrically, the two column vectors are collinear, spanning only a line in R2.

Cramer’s Rule

Determinants also provide an explicit formula for solving systems of linear equations when the matrix is invertible.
For Ax=b with ARn×n:

xi=det(Ai)det(A),

where Ai is obtained by replacing the i-th column of A with b.
While inefficient computationally, Cramer’s rule highlights the determinant’s role in solutions and uniqueness.

Orientation

The sign of det(A) indicates whether a transformation preserves or reverses orientation. For example, a reflection in the plane has determinant 1, flipping handedness.

Why this matters

Determinants condense key information: they measure scaling, test invertibility, and track orientation. These insights are indispensable in geometry (areas and volumes), analysis (Jacobian determinants in calculus), and computation (solving systems and checking singularity).

Exercises 6.4

  1. Compute the area of the parallelogram spanned by (2,1) and (1,3).
  2. Find the volume of the parallelepiped spanned by (1,0,0),(1,1,0),(1,1,1).
  3. Determine whether the matrix [1236] is invertible. Justify using determinants.
  4. Use Cramer’s rule to solve
{x+y=3,2xy=0.
  1. Explain geometrically why a determinant of zero implies no inverse exists.

Chapter 7. Inner Product Spaces

7.1 Inner Products and Norms

To extend the geometric ideas of length, distance, and angle beyond R2 and R3, we introduce inner products. Inner products provide a way of measuring similarity between vectors, while norms derived from them measure length. These concepts are the foundation of geometry inside vector spaces.

Inner Product

An inner product on a real vector space V is a function

,:V×VR

that assigns to each pair of vectors (u,v) a real number, subject to the following properties:

  1. Symmetry:
    u,v=v,u.

  2. Linearity in the first argument:
    au+bw,v=au,v+bw,v.

  3. Positive-definiteness:
    v,v0, and equality holds if and only if v=0.

The standard inner product on Rn is the dot product:

u,v=u1v1+u2v2++unvn.

Norms

The norm of a vector is its length, defined in terms of the inner product:

v=v,v.

For the dot product in Rn:

(x1,x2,,xn)=x12+x22++xn2.

Angles Between Vectors

The inner product allows us to define the angle θ between two nonzero vectors u,v by

cosθ=u,vuv.

Thus, two vectors are orthogonal if u,v=0.

Examples

Example 7.1.1.
In R2, with u=(1,2), v=(3,4):

u,v=13+24=11.u=12+22=5,v=32+42=5.

So,

cosθ=1155.

Example 7.1.2.
In the function space C[0,1], the inner product

f,g=01f(x)g(x)dx

defines a length

f=01f(x)2dx.

This generalizes geometry to infinite-dimensional spaces.

Geometric Interpretation

These concepts unify algebraic operations with geometric intuition.

Why this matters

Inner products and norms allow us to extend geometry into abstract vector spaces. They form the basis of orthogonality, projections, Fourier series, least squares approximation, and many applications in physics and machine learning.

Exercises 7.1

  1. Compute (2,1,3),(1,4,0). Then find the angle between them.
  2. Show that (x,y)=x2+y2 satisfies the properties of a norm.
  3. In R3, verify that (1,1,0) and (1,1,0) are orthogonal.
  4. In C[0,1], compute f,g for f(x)=x, g(x)=1.
  5. Prove the Cauchy–Schwarz inequality:
|u,v|uv.

7.2 Orthogonal Projections

One of the most useful applications of inner products is the notion of orthogonal projection. Projection allows us to approximate a vector by another lying in a subspace, minimizing error in the sense of distance. This idea underpins geometry, statistics, and numerical analysis.

Projection onto a Line

Let uRn be a nonzero vector. The line spanned by u is

L={cucR}.

Given a vector v, the projection of v onto u is the vector in L closest to v. Geometrically, it is the shadow of v on the line.

The formula is

proju(v)=v,uu,uu.

The error vector vproju(v) is orthogonal to u.

Example 7.2.1

Let u=(1,2), v=(3,1).

v,u=31+12=5,u,u=12+22=5.

So

proju(v)=55(1,2)=(1,2).

The error vector is (3,1)(1,2)=(2,1), which is orthogonal to (1,2).

Projection onto a Subspace

Suppose WRn is a subspace with orthonormal basis {w1,,wk}. The projection of a vector v onto W is

projW(v)=v,w1w1++v,wkwk.

This is the unique vector in W closest to v. The difference vprojW(v) is orthogonal to all of W.

Least Squares Approximation

Orthogonal projection explains the method of least squares. To solve an overdetermined system Axb, we seek the x that makes Ax the projection of b onto the column space of A. This gives the normal equations

ATAx=ATb.

Thus, least squares is just projection in disguise.

Geometric Interpretation

Why this matters

Orthogonal projection is central in both pure and applied mathematics. It underlies the geometry of subspaces, the theory of Fourier series, regression in statistics, and approximation methods in numerical linear algebra. Whenever we fit data with a simpler model, projection is at work.

Exercises 7.2

  1. Compute the projection of (2,3) onto the vector (1,1).
  2. Show that vproju(v) is orthogonal to u.
  3. Let W=span{(1,0,0),(0,1,0)}R3. Find the projection of (1,2,3) onto W.
  4. Explain why least squares fitting corresponds to projection onto the column space of A.
  5. Prove that projection onto a subspace W is unique: there is exactly one closest vector in W to a given v.

7.3 Gram–Schmidt Process

The Gram–Schmidt process is a systematic way to turn any linearly independent set of vectors into an orthonormal basis. This is especially useful because orthonormal bases simplify computations: inner products become simple coordinate comparisons, and projections take clean forms.

The Idea

Given a linearly independent set of vectors {v1,v2,,vn} in an inner product
space, we want to construct an orthonormal set {u1,u2,,un} that spans the same
subspace.

We proceed step by step:

  1. Start with v1, normalize it to get u1.
  2. Subtract from v2 its projection onto u1, leaving a vector orthogonal to u1. Normalize to get u2.
  3. For each vk, subtract projections onto all previously constructed u1,,uk1, then normalize.

The Algorithm

For k=1,2,,n:

wk=vkj=1k1vk,ujuj,uk=wkwk.

The result {u1,,un} is an orthonormal basis of the span of the original vectors.

Example 7.3.1

Take v1=(1,1,0), v2=(1,0,1), v3=(0,1,1) in R3.

  1. Normalize v1:
u1=12(1,1,0).
  1. Subtract projection of v2 on u1:
w2=v2v2,u1u1.v2,u1=12(11+01+10)=12.

So

w2=(1,0,1)1212(1,1,0)=(1,0,1)12(1,1,0)=(12,12,1).

Normalize:

u2=114+14+1(12,12,1)=132(12,12,1).
  1. Subtract projections from v3:
w3=v3v3,u1u1v3,u2u2.

After computing, normalize to obtain u3.

The result is an orthonormal basis of the span of {v1,v2,v3}.

Geometric Interpretation

Gram–Schmidt is like straightening out a set of vectors: you start with the original directions and adjust each new vector to be perpendicular to all previous ones. Then you scale to unit length. The process ensures orthogonality while preserving the span.

Why this matters

Orthonormal bases simplify inner products, projections, and computations in general. They make coordinate systems easier to work with and are crucial in numerical methods, QR decomposition, Fourier analysis, and statistics (orthogonal polynomials, principal component analysis).

Exercises 7.3

  1. Apply Gram–Schmidt to (1,0),(1,1) in R2.
  2. Orthogonalize (1,1,1),(1,0,1) in R3.
  3. Prove that each step of Gram–Schmidt yields a vector orthogonal to all previous ones.
  4. Show that Gram–Schmidt preserves the span of the original vectors.
  5. Explain how Gram–Schmidt leads to the QR decomposition of a matrix.

7.4 Orthonormal Bases

An orthonormal basis is a basis of a vector space in which all vectors are both orthogonal to each other and have unit length. Such bases are the most convenient possible coordinate systems: computations involving inner products, projections, and norms become exceptionally simple.

Definition

A set of vectors {u1,u2,,un} in an inner product space V is called an orthonormal basis if

  1. ui,uj=0 whenever ij (orthogonality),
  2. ui=1 for all i (normalization),
  3. The set spans V.

Examples

Example 7.4.1. In R2, the standard basis

e1=(1,0),e2=(0,1)

is orthonormal under the dot product.

Example 7.4.2. In R3, the standard basis

e1=(1,0,0),e2=(0,1,0),e3=(0,0,1)

is orthonormal.

Example 7.4.3. Fourier basis on functions:

{1,cosx,sinx,cos2x,sin2x,}

is an orthogonal set in the space of square-integrable functions on [π,π] with inner product

f,g=ππf(x)g(x)dx.

After normalization, it becomes an orthonormal basis.

Properties

  1. Coordinate simplicity: If {u1,,un} is an orthonormal basis of V, then any vector vV has coordinates
[v]=[v,u1v,un].

That is, coordinates are just inner products.

  1. Parseval’s identity: For any vV,
v2=i=1n|v,ui|2.
  1. Projections: The orthogonal projection onto the span of {u1,,uk} is
proj(v)=i=1kv,uiui.

Constructing Orthonormal Bases

Geometric Interpretation

An orthonormal basis is like a perfectly aligned and equally scaled coordinate system. Distances and angles are computed directly using coordinates without correction factors. They are the ideal rulers of linear algebra.

Why this matters

Orthonormal bases simplify every aspect of linear algebra: solving systems, computing projections, expanding functions, diagonalizing symmetric matrices, and working with Fourier series. In data science, principal component analysis produces orthonormal directions capturing maximum variance.

Exercises 7.4

  1. Verify that (1/2)(1,1) and (1/2)(1,1) form an orthonormal basis of R2.
  2. Express (3,4) in terms of the orthonormal basis {(1/2)(1,1),(1/2)(1,1)}.
  3. Prove Parseval’s identity for Rn with the dot product.
  4. Find an orthonormal basis for the plane x+y+z=0 in R3.
  5. Explain why orthonormal bases are numerically more stable than arbitrary bases in computations.

Chapter 8. Eigenvalues and eigenvectors

8.1 Definitions and Intuition

The concepts of eigenvalues and eigenvectors reveal the most fundamental behavior of linear transformations. They identify the special directions in which a transformation acts by simple stretching or compressing, without rotation or distortion.

Definition

Let T:VV be a linear transformation on a vector space V. A nonzero vector vV is called an eigenvector of T if

T(v)=λv

for some scalar λR (or C). The scalar λ is the eigenvalue corresponding to v.

Equivalently, if A is the matrix of T, then eigenvalues and eigenvectors satisfy

Av=λv.

Basic Examples

Example 8.1.1.
Let

A=[2003].

Then

A(1,0)T=2(1,0)T,A(0,1)T=3(0,1)T.

So (1,0) is an eigenvector with eigenvalue 2, and (0,1) is an eigenvector with eigenvalue 3.

Example 8.1.2.
Rotation matrix in R2:

Rθ=[cosθsinθsinθcosθ].

If θ0,π, Rθ has no real eigenvalues: every vector is rotated, not scaled. Over C, however, it has eigenvalues eiθ,eiθ.

Algebraic Formulation

Eigenvalues arise from solving the characteristic equation:

det(AλI)=0.

This polynomial in λ is the characteristic polynomial. Its roots are the eigenvalues.

Geometric Intuition

Applications in Geometry and Science

Why this matters

Eigenvalues and eigenvectors are a bridge between algebra and geometry. They provide a lens for understanding linear transformations in their simplest form. Nearly every application of linear algebra-differential equations, statistics, physics, computer science-relies on eigen-analysis.

Exercises 8.1

  1. Find the eigenvalues and eigenvectors of
    [4001].
  2. Show that every scalar multiple of an eigenvector is again an eigenvector for the same eigenvalue.
  3. Verify that the rotation matrix Rθ has no real eigenvalues unless θ=0 or π.
  4. Compute the characteristic polynomial of
    [1221].
  5. Explain geometrically what eigenvectors and eigenvalues represent for the shear matrix
    [1101].

8.2 Diagonalization

A central goal in linear algebra is to simplify the action of a matrix by choosing a good basis. Diagonalization is the process of rewriting a matrix so that it acts by simple scaling along independent directions. This makes computations such as powers, exponentials, and solving differential equations far easier.

Definition

A square matrix ARn×n is diagonalizable if there exists an invertible matrix P such that

P1AP=D,

where D is a diagonal matrix.

The diagonal entries of D are eigenvalues of A, and the columns of P are the corresponding eigenvectors.

When is a Matrix Diagonalizable?

Example 8.2.1

Let

A=[4102].
  1. Characteristic polynomial:
det(AλI)=(4λ)(2λ).

So eigenvalues are λ1=4, λ2=2.

  1. Eigenvectors:
  1. Construct P=[1102]. Then
P1AP=[4002].

Thus, A is diagonalizable.

Why Diagonalize?

Ak=PDkP1.

Since D is diagonal, Dk is easy to compute.

Non-Diagonalizable Example

Not all matrices can be diagonalized.

A=[1101]

has only one eigenvalue λ=1, with eigenspace dimension 1. Since n=2 but we only have 1 independent eigenvector, A is not diagonalizable.

Geometric Interpretation

Diagonalization means we have found a basis of eigenvectors. In this basis, the matrix acts by simple scaling along each coordinate axis. It transforms complicated motion into independent 1D motions.

Why this matters

Diagonalization is a cornerstone of linear algebra. It simplifies computation, reveals structure, and is the starting point for the spectral theorem, Jordan form, and many applications in physics, engineering, and data science.

Exercises 8.2

  1. Diagonalize
A=[2003].
  1. Determine whether
A=[1101]

is diagonalizable. Why or why not?

  1. Find A5 for
A=[4102]

using diagonalization.

  1. Show that any n×n matrix with n distinct eigenvalues is diagonalizable.

  2. Explain why real symmetric matrices are always diagonalizable.

8.3 Characteristic Polynomials

The key to finding eigenvalues is the characteristic polynomial of a matrix. This polynomial encodes the values of λ for which the matrix AλI fails to be invertible.

Definition

For an n×n matrix A, the characteristic polynomial is

pA(λ)=det(AλI).

The roots of pA(λ) are the eigenvalues of A.

Examples

Example 8.3.1.
Let

A=[2112].

Then

pA(λ)=det[2λ112λ]=(2λ)21=λ24λ+3.

Thus eigenvalues are λ=1,3.

Example 8.3.2.
For

A=[0110]

(rotation by 90°),

pA(λ)=det[λ11λ]=λ2+1.

Eigenvalues are λ=±i. No real eigenvalues exist, consistent with pure rotation.

Example 8.3.3.
For a triangular matrix

A=[210035004],

the determinant is simply the product of diagonal entries minus λ:

pA(λ)=(2λ)(3λ)(4λ).

So eigenvalues are 2,3,4.

Properties

  1. The characteristic polynomial of an n×n matrix has degree n.
  2. The sum of the eigenvalues (counted with multiplicity) equals the trace of A:
tr(A)=λ1++λn.
  1. The product of the eigenvalues equals the determinant of A:
det(A)=λ1λn.
  1. Similar matrices have the same characteristic polynomial, hence the same eigenvalues.

Geometric Interpretation

The characteristic polynomial captures when AλI collapses space: its determinant is zero precisely when the transformation AλI is singular. Thus, eigenvalues mark the critical scaling where the matrix loses invertibility.

Why this matters

Characteristic polynomials provide the computational tool to extract eigenvalues. They connect matrix invariants (trace and determinant) with geometry, and form the foundation for diagonalization, spectral theorems, and stability analysis in dynamical systems.

Exercises 8.3

  1. Compute the characteristic polynomial of
A=[4213].
  1. Verify that the sum of the eigenvalues of
    [5002]
    equals its trace, and their product equals its determinant.

  2. Show that for any triangular matrix, the eigenvalues are just the diagonal entries.

  3. Prove that if A and B are similar matrices, then pA(λ)=pB(λ).

  4. Compute the characteristic polynomial of
    [110011001].

8.4 Applications (Differential Equations, Markov Chains)

Eigenvalues and eigenvectors are not only central to the theory of linear algebra-they are indispensable tools across mathematics and applied science. Two classic applications are solving systems of differential equations and analyzing Markov chains.

Linear Differential Equations

Consider the system

dxdt=Ax,

where A is an n×n matrix and x(t) is a vector-valued function.

If v is an eigenvector of A with eigenvalue λ, then the function

x(t)=eλtv

is a solution.

By combining eigenvector solutions, we can solve general initial conditions.

Example 8.4.1.
Let

A=[2001].

Then eigenvalues are 2,1 with eigenvectors (1,0), (0,1). Solutions are

x(t)=c1e2t(1,0)+c2et(0,1).

Thus one component grows exponentially, the other decays.

Markov Chains

A Markov chain is described by a stochastic matrix P, where each column sums to 1 and entries are nonnegative. If xk represents the probability distribution after k steps, then

xk+1=Pxk.

Iterating gives

xk=Pkx0.

Understanding long-term behavior reduces to analyzing powers of P.

Example 8.4.2.
Consider

P=[0.90.50.10.5].

Eigenvalues are λ1=1, λ2=0.4. The eigenvector for λ=1 is proportional to (5,1).
Normalizing gives the steady state

π=(56,16).

Thus, regardless of the starting distribution, the chain converges to π.

Geometric Interpretation

Why this matters

Eigenvalue methods turn complex iterative or dynamical systems into tractable problems. In physics, engineering, and finance, they describe stability and resonance. In computer science and statistics, they power algorithms from Google’s PageRank to modern machine learning.

Exercises 8.4

  1. Solve ddtx=[3002]x.
  2. Show that if A has a complex eigenvalue α±iβ, then solutions
    of ddtx=Ax involve oscillations of frequency β.
  3. Find the steady-state distribution of
P=[0.70.20.30.8].
  1. Prove that for any stochastic matrix P, 1 is always an eigenvalue.
  2. Explain why all eigenvalues of a stochastic matrix satisfy |λ|1.

Chapter 9. Quadratic Forms and Spectral Theorems

9.1 Quadratic Forms

A quadratic form is a polynomial of degree two in several variables, expressed neatly using matrices. Quadratic forms appear throughout mathematics: in optimization, geometry of conic sections, statistics (variance), and physics (energy functions).

Definition

Let A be an n×n symmetric matrix and xRn. The quadratic form associated with A is

Q(x)=xTAx.

Expanded,

Q(x)=i=1nj=1naijxixj.

Because A is symmetric (aij=aji), the cross-terms can be grouped naturally.

Examples

Example 9.1.1.
For

A=[2113],x=[xy],Q(x,y)=[xy][2113][xy]=2x2+2xy+3y2.

Example 9.1.2.
The quadratic form

Q(x,y)=x2+y2

corresponds to the matrix A=I2. It measures squared Euclidean distance from the origin.

Example 9.1.3.
The conic section equation

4x2+2xy+5y2=1

is described by the quadratic form xTAx=1 with

A=[4115].

Diagonalization of Quadratic Forms

By choosing a new basis consisting of eigenvectors of A, we can rewrite the quadratic form without cross terms.
If A=PDP1 with D diagonal, then

Q(x)=xTAx=(P1x)TD(P1x).

Thus quadratic forms can always be expressed as a sum of weighted squares:

Q(y)=λ1y12++λnyn2,

where λi are the eigenvalues of A.

Geometric Interpretation

Quadratic forms describe geometric shapes:

Diagonalization aligns the coordinate axes with the principal axes of the shape.

Why this matters

Quadratic forms unify geometry and algebra. They are central in optimization (minimizing energy functions), statistics (covariance matrices and variance), mechanics (kinetic energy), and numerical analysis. Understanding quadratic forms leads directly to the spectral theorem.

Exercises 9.1

  1. Write the quadratic form Q(x,y)=3x2+4xy+y2 as xTAx for some symmetric matrix A.
  2. For A=[1221], compute Q(x,y) explicitly.
  3. Diagonalize the quadratic form Q(x,y)=2x2+2xy+3y2.
  4. Identify the conic section given by Q(x,y)=x2y2.
  5. Show that if A is symmetric, quadratic forms defined by A and AT are identical.

9.2 Positive Definite Matrices

Quadratic forms are especially important when their associated matrices are positive definite, since these guarantee positivity of energy, distance, or variance. Positive definiteness is a cornerstone in optimization, numerical analysis, and statistics.

Definition

A symmetric matrix ARn×n is called:

xTAx>0for all nonzero xRn. xTAx0for all x.

Similarly, negative definite (always < 0) and indefinite (can be both < 0 and > 0) matrices are defined.

Examples

Example 9.2.1.

A=[2003]

is positive definite, since

Q(x,y)=2x2+3y2>0

for all (x,y)(0,0).

Example 9.2.2.

A=[1221]

has quadratic form

Q(x,y)=x2+4xy+y2.

This matrix is not positive definite, since Q(1,1)=2<0.

Characterizations

For a symmetric matrix A:

  1. Eigenvalue test: A is positive definite if and only if all eigenvalues of A are positive.
  2. Principal minors test (Sylvester’s criterion): A is positive definite if and only if all leading principal minors ( determinants of top-left k×k submatrices) are positive.
  3. Cholesky factorization: A is positive definite if and only if it can be written as
A=RTR,

where R is an upper triangular matrix with positive diagonal entries.

Geometric Interpretation

Applications

Why this matters

Positive definiteness provides stability and guarantees in mathematics and computation. It ensures energy functions are bounded below, optimization problems have unique solutions, and statistical models are meaningful.

Exercises 9.2

  1. Use Sylvester’s criterion to check whether
A=[2112]

is positive definite.

  1. Determine whether
A=[0110]

is positive definite, semidefinite, or indefinite.

  1. Find the eigenvalues of
A=[4223],

and use them to classify definiteness.

  1. Prove that all diagonal matrices with positive entries are positive definite.

  2. Show that if A is positive definite, then so is PTAP for any invertible matrix P.

9.3 Spectral Theorem

The spectral theorem is one of the most powerful results in linear algebra. It states that symmetric matrices can always be diagonalized by an orthogonal basis of eigenvectors. This links algebra (eigenvalues), geometry (orthogonal directions), and applications (stability, optimization, statistics).

Statement of the Spectral Theorem

If ARn×n is symmetric (AT=A), then:

  1. All eigenvalues of A are real.
  2. There exists an orthonormal basis of Rn consisting of eigenvectors of A.
  3. Thus, A can be written as
A=QΛQT,

where Q is an orthogonal matrix (QTQ=I) and Λ is diagonal with eigenvalues of A on the diagonal.

Consequences

Example 9.3.1

Let

A=[2112].
  1. Characteristic polynomial:
p(λ)=(2λ)21=λ24λ+3.

Eigenvalues: λ1=1, λ2=3.

  1. Eigenvectors:
  1. Normalize eigenvectors:
u1=12(1,1),u2=12(1,1).
  1. Then
Q=[1212\[6pt]1212],Λ=[1003].

So

A=QΛQT.

Geometric Interpretation

The spectral theorem says every symmetric matrix acts like independent scaling along orthogonal directions. In geometry, this corresponds to stretching space along perpendicular axes.

Applications

Why this matters

The spectral theorem guarantees that symmetric matrices are as simple as possible: they can always be analyzed in terms of real, orthogonal eigenvectors. This provides both deep theoretical insight and powerful computational tools.

Exercises 9.3

  1. Diagonalize
A=[4223]

using the spectral theorem.

  1. Prove that all eigenvalues of a real symmetric matrix are real.

  2. Show that eigenvectors corresponding to distinct eigenvalues of a symmetric matrix are orthogonal.

  3. Explain geometrically how the spectral theorem describes ellipsoids defined by quadratic forms.

  4. Apply the spectral theorem to the covariance matrix

Σ=[2112],

and interpret the eigenvectors as principal directions of variance.

9.4 Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a widely used technique in data science, machine learning, and statistics. At its core, PCA is an application of the spectral theorem to covariance matrices: it finds orthogonal directions (principal components) that capture the maximum variance in data.

The Idea

Given a dataset of vectors x1,x2,,xmRn:

  1. Center the data by subtracting the mean vector x¯.
  2. Form the covariance matrix
Σ=1mi=1m(xix¯)(xix¯)T.
  1. Apply the spectral theorem: Σ=QΛQT.

    • Columns of Q are orthonormal eigenvectors (principal directions).
    • Eigenvalues in Λ measure variance explained by each direction.

The first principal component is the eigenvector corresponding to the largest eigenvalue; it is the direction of maximum variance.

Example 9.4.1

Suppose we have two-dimensional data points roughly aligned along the line y=x. The covariance matrix is approximately

Σ=[21.91.92].

Eigenvalues are about 3.9 and 0.1. The eigenvector for λ=3.9 is approximately (1,1)/2.

Thus PCA reduces the data to essentially one dimension.

Applications of PCA

  1. Dimensionality reduction: Represent data with fewer features while retaining most variance.
  2. Noise reduction: Small eigenvalues correspond to noise; discarding them filters data.
  3. Visualization: Projecting high-dimensional data onto top 2 or 3 principal components reveals structure.
  4. Compression: PCA is used in image and signal compression.

Connection to the Spectral Theorem

The covariance matrix Σ is always symmetric and positive semidefinite. Hence by the spectral theorem, it has an orthonormal basis of eigenvectors and nonnegative real eigenvalues. PCA is nothing more than re-expressing data in this eigenbasis.

Why this matters

PCA demonstrates how abstract linear algebra directly powers modern applications. Eigenvalues and eigenvectors give a practical method for simplifying data, revealing patterns, and reducing complexity. It is one of the most important algorithms derived from the spectral theorem.

Exercises 9.4

  1. Show that the covariance matrix is symmetric and positive semidefinite.
  2. Compute the covariance matrix of the dataset (1,2),(2,3),(3,4), and find its eigenvalues and eigenvectors.
  3. Explain why the first principal component captures the maximum variance.
  4. In image compression, explain how PCA can reduce storage by keeping only the top k principal components.
  5. Prove that the sum of the eigenvalues of the covariance matrix equals the total variance of the dataset.

Chapter 10. Linear Algebra in Practice

10.1 Computer Graphics (Rotations, Projections)

Linear algebra is the language of modern computer graphics. Every image rendered on a screen, every 3D model rotated or projected, is ultimately the result of applying matrices to vectors. Rotations, reflections, scalings, and projections are all linear transformations, making matrices the natural tool for manipulating geometry.

Rotations in 2D

A counterclockwise rotation by an angle θ in the plane is represented by

Rθ=[cosθsinθsinθcosθ].

For any vector vR2, the rotated vector is

v=Rθv.

This preserves lengths and angles, since Rθ is orthogonal with determinant 1.

Rotations in 3D

In three dimensions, rotations are represented by 3×3 orthogonal matrices with determinant 1. For example, a rotation about the z-axis is

Rz(θ)=[cosθsinθ0sinθcosθ0001].

Similar formulas exist for rotations about the x- and y-axes.
More general 3D rotations can be described by axis–angle representation or quaternions, but the underlying idea is still linear transformations represented by matrices.

Projections

To display 3D objects on a 2D screen, we use projections:

  1. Orthogonal projection: drops the z-coordinate, mapping (x,y,z)(x,y).
P=[100010].
  1. Perspective projection: mimics the effect of a camera. A point (x,y,z) projects to
(xz,yz),

capturing how distant objects appear smaller.

These operations are linear (orthogonal projection) or nearly linear (perspective projection becomes linear in homogeneous coordinates).

Homogeneous Coordinates

To unify translations and projections with linear transformations, computer graphics uses homogeneous coordinates. A 3D point (x,y,z) is represented as a 4D vector (x,y,z,1). Transformations are then 4×4 matrices, which can represent rotations, scalings, and translations in a single framework.

Example: Translation by (a,b,c):

T=[100a010b001c0001].

Geometric Interpretation

Why this matters

Linear algebra enables all real-time graphics: video games, simulations, CAD software, and movie effects. By chaining simple matrix operations, complex transformations are applied efficiently to millions of points per second.

Exercises 10.1

  1. Write the rotation matrix for a 90° counterclockwise rotation in R2. Apply it to (1,0).
  2. Rotate the point (1,1,0) about the z-axis by 180°.
  3. Show that the determinant of any 2D or 3D rotation matrix is 1.
  4. Derive the orthogonal projection matrix from R3 to the xy-plane.
  5. Explain how homogeneous coordinates allow translations to be represented as matrix multiplications.

10.2 Data Science (Dimensionality Reduction, Least Squares)

Linear algebra provides the foundation for many data science techniques. Two of the most important are dimensionality reduction, where high-dimensional datasets are compressed while preserving essential information, and the least squares method, which underlies regression and model fitting.

Dimensionality Reduction

High-dimensional data often contains redundancy: many features are correlated, meaning the data essentially lies near a lower-dimensional subspace. Dimensionality reduction identifies these subspaces.

Example 10.2.1. A dataset of 1000 images, each with 1024 pixels, may have most variance captured by just 50 eigenvectors of the covariance matrix. Projecting onto these components compresses the data while preserving essential features.

Least Squares

Often, we have more equations than unknowns-an overdetermined system:

Axb,ARm×n, m>n.

An exact solution may not exist. Instead, we seek x that minimizes the error

Axb2.

This leads to the normal equations:

ATAx=ATb.

The solution is the orthogonal projection of b onto the column space of A.

Example 10.2.2

Fit a line y=mx+c to data points (xi,yi).

Matrix form:

A=[x11x21xm1],b=[y1y2ym],x=[mc].

Solve ATAx=ATb. This yields the best-fit line in the least squares sense.

Geometric Interpretation

Both are projection problems, solved using inner products and orthogonality.

Why this matters

Dimensionality reduction makes large datasets tractable, filters noise, and reveals structure. Least squares fitting powers regression, statistics, and machine learning. Both rely directly on eigenvalues, eigenvectors, and projections-core tools of linear algebra.

Exercises 10.2

  1. Explain why PCA reduces noise in datasets by discarding small eigenvalue components.
  2. Compute the least squares solution to fitting a line through (0,0),(1,1),(2,2).
  3. Show that the least squares solution is unique if and only if ATA is invertible.
  4. Prove that the least squares solution minimizes the squared error by projection arguments.
  5. Apply PCA to the data points (1,0),(2,1),(3,2) and find the first principal component.

10.3 Networks and Markov Chains

Graphs and networks provide a natural setting where linear algebra comes to life. From modeling flows and connectivity to predicting long-term behavior, matrices translate network structure into algebraic form. Markov chains, already introduced in Section 8.4, are a central example of networks evolving over time.

Adjacency Matrices

A network (graph) with n nodes can be represented by an adjacency matrix ARn×n:

Aij={1if there is an edge from node i to node j0otherwise.

For weighted graphs, entries may be positive weights instead of 0/1.

Laplacian Matrices

Another important matrix is the graph Laplacian:

L=DA,

where D is the diagonal degree matrix (Dii=degree(i)).

This connection between eigenvalues and connectivity forms the basis of spectral graph theory.

Markov Chains on Graphs

A Markov chain can be viewed as a random walk on a graph. If P is the transition matrix where Pij is the probability of moving from node i to node j, then

xk+1=Pxk

describes the distribution of positions after k steps.

Example 10.3.1

Consider a simple 3-node cycle graph:

P=[010001100].

This Markov chain cycles deterministically among the nodes. Eigenvalues are the cube roots of unity: 1,e2πi/3,e4πi/3. The eigenvalue 1 corresponds to the steady state, which is the uniform distribution (1/3,1/3,1/3).

Applications

Why this matters

Linear algebra transforms network problems into matrix problems. Eigenvalues and eigenvectors reveal connectivity, flow, stability, and long-term dynamics. Networks are everywhere-social media, biology, finance, and the internet-so these tools are indispensable.

Exercises 10.3

  1. Write the adjacency matrix of a square graph with 4 nodes. Compute A2 and interpret the entries.
  2. Show that the Laplacian of a connected graph has exactly one zero eigenvalue.
  3. Find the steady-state distribution of the Markov chain with
P=[0.50.50.40.6].
  1. Explain how eigenvalues of the Laplacian can detect disconnected components of a graph.
  2. Describe how PageRank modifies the transition matrix of the web graph to ensure a unique steady-state distribution.

10.4 Machine Learning Connections

Modern machine learning is built on linear algebra. From the representation of data as matrices to the optimization of large-scale models, nearly every step relies on concepts such as vector spaces, projections, eigenvalues, and matrix decompositions.

Data as Matrices

A dataset with m examples and n features is represented as a matrix XRm×n:

X=[x1Tx2TxmT],

where each row xiRn is a feature vector. Linear algebra provides tools to analyze, compress, and transform this data.

Linear Models

At the heart of machine learning are linear predictors:

y^=Xw,

where w is the weight vector. Training often involves solving a least squares problem or a regularized variant such as ridge regression:

minwXwy2+λw2.

This is solved efficiently using matrix factorizations.

Singular Value Decomposition (SVD)

The SVD of a matrix X is

X=UΣVT,

where U,V are orthogonal and Σ is diagonal with nonnegative entries (singular values).

Eigenvalues in Machine Learning

Neural Networks

Even deep learning, though nonlinear, uses linear algebra at its core:

Why this matters

Machine learning models often involve datasets with millions of features and parameters. Linear algebra provides the algorithms and abstractions that make training and inference possible. Without it, large-scale computation in AI would be intractable.

Exercises 10.4

  1. Show that ridge regression leads to the normal equations
(XTX+λI)w=XTy.
  1. Explain how SVD can be used to compress an image represented as a matrix of pixel intensities.

  2. For a covariance matrix Σ, show why its eigenvalues represent variances along principal components.

  3. Give an example of how eigenvectors of the Laplacian matrix can be used for clustering a small graph.

  4. In a neural network with one hidden layer, write the forward pass in matrix form.