Matrix-vector multiplication

Let $f$ be a linear function with 2D vectors as inputs. Previously we noticed that the whole function is completely defined by $\red{f(1,0)}$ and $\blue{f(0,1)}$; if you know what those are, you can calculate $f(x,y)=x\red{f(1,0)}+y\blue{f(0,1)}$ with any $x$ and $y$. We will now introduce some notation to make this easier.

Matrices as a way to write a linear function

It is common to write the vectors $\red{f(1,0)}$ and $\blue{f(0,1)}$ into a matrix so that $\red{f(1,0)}$ is the first column and $\blue{f(0,1)}$ is the second column. For example, for $\red{f(1,0) = (2,3)}$ and $\blue{f(0,1) = (4,5)}$, the matrix would be $$ \begin{bmatrix} \red2&\blue4 \\ \red3&\blue5 \end{bmatrix}. $$ A matrix is just a grid of numbers. It doesn't have to be the same size in both directions. For example, if $f$ takes 3D inputs and produces 2D outputs, so that $\red{f(1,0,0)=(2,3)}$, $\blue{f(0,1,0)=(4,5)}$ and $\green{f(0,0,1)=(6,7)}$, the matrix would be $$ \begin{bmatrix} \red2&\blue4&\green6 \\ \red3&\blue5&\green7 \end{bmatrix}. $$

The matrix of a linear function $f$ is a matrix whose columns are $f(1,0,0,\dots,0)$, $f(0,1,0,\dots,0)$, $\dots$, $f(0,0,\dots,0,1)$, written vertically.

A matrix with 2 rows and 3 columns is also called a $2 \times 3$ matrix. Note that the number of rows goes first, even though usually $x$ goes before $y$. To prevent this confusion, in these derivations I tend to say something like "width 3 and height 2" instead of "$2 \times 3$ matrix".

Examples:

Introducing matrix-vector multiplication

In this context, instead of writing vectors like $(10,20)$, they are usually written like $$ \begin{bmatrix}10\\20\end{bmatrix}. $$ Writing a matrix and a vector next to each other, e.g. $$ \begin{bmatrix}2&4 \\ 3&5 \end{bmatrix}\begin{bmatrix}10\\20\end{bmatrix} $$ means $f(\text{the vector})$, where $f$ is the linear function corresponding to the matrix. For example, with a 2D linear function $f$, we have $f(x,y)=x\red{f(1,0)}+y\blue{f(0,1)}$, so if $\red{f(1,0)=(2,3)}$ and $\blue{f(0,1)=(4,5)}$, we get $$ \begin{align} f({10},{20})&={10}\red{f(1,0)}+{20}\blue{f(0,1)} \\ = \begin{bmatrix} \red2 & \blue4 \\ \red3 & \blue5 \end{bmatrix} \begin{bmatrix} {10} \\ {20} \end{bmatrix} &= {10}\red{\begin{bmatrix} 2 \\ 3 \end{bmatrix}} + {20}\blue{\begin{bmatrix}4 \\ 5 \end{bmatrix}} =\begin{bmatrix} 20+80 \\ 30+100 \end{bmatrix} = \begin{bmatrix} 100 \\ 130 \end{bmatrix}. \end{align} $$

Multiplying a matrix and a vector means creating a linear combination of the columns of the matrix with numbers from the vector as coefficients. This calculates $f(\text{the vector})$, where $f$ is the linear function corresponding to the matrix.

I will later explain why this operation is called multiplying.

Examples:

Input and output dimensions

Because the numbers in the input vector correspond with the columns of the matrix, there must be the same number of columns as there are numbers in the vector. For example, this matrix-vector multiplication is undefined, because the vector contains 2 numbers but the matrix has 3 columns: $$ \begin{bmatrix} \red1&\blue3&\green5 \\ \red2&\blue4&\green6 \end{bmatrix} \begin{bmatrix} 7 \\ 8 \end{bmatrix} = 7\red{\begin{bmatrix}1 \\ 2\end{bmatrix}} + 8\blue{\begin{bmatrix}3 \\ 4\end{bmatrix}} + \text{???}\green{\begin{bmatrix}5 \\ 6\end{bmatrix}} $$ Here's a matrix-vector multiplication that actually works: $$ \begin{bmatrix} \red1&\blue3&\green5 \\ \red2&\blue4&\green6 \end{bmatrix} \begin{bmatrix} 7\\8\\9 \end{bmatrix} = 7\red{\begin{bmatrix}1 \\ 2\end{bmatrix}} + 8\blue{\begin{bmatrix}3 \\ 4\end{bmatrix}} + 9\green{\begin{bmatrix}5 \\ 6\end{bmatrix}} = \begin{bmatrix}76 \\ 100\end{bmatrix} $$ Here the width of the matrix is 3 and the height is 2. It can be multiplied with 3D vectors, and the result is a 2D vector. This works in general.

The width of a matrix is the dimension of its input vectors. The height of a matrix is the dimension of its output vectors.