In machine learning, you deal a lot with vectors, which can be roughly described as a collection of numbers – the components – where the order counts. For example, the following:
$$ \vec{u} = \begin{pmatrix} 1 & 3 & 4 & 5 \end{pmatrix} $$
is a vector, and it is different from:
$$ \vec{v} = \begin{pmatrix} 3 & 4 & 1 & 5 \end{pmatrix} $$
where the elements have a different order.
Mathematicians have formalized these concepts by introducing vector spaces which are sets containing vectors with specific properties.
You all know the intuitive concept of set which is usually a finite list of elements ${pear, apple, 3, 📐}$ or a collection of elements with a particular property, like the set of all even natural numbers, the set of all chairs, and so on.
You can “equip” a set with one or more operations to obtain an algebraic structure. For example, you can equip the set of natural numbers with the addition operator, to obtain the structure $(\mathbb{N}, +)$. The structure can be then studied in order to check whether it possesses some “interesting” properties. In the previous example, you can easily see that given any three natural numbers, $a$, $b$ and $c$ then:
$$(a + b) + c = a + (b + c)$$
This property is called associativity and the structure $(\mathbb{N}, +)$ possesses it.
A vector space is a structure built on top of another structure, called field, generically denoted as $\mathbb{K}$. The field is often a numerical set, like $\mathbb{R}$ equipped with an addition and a multiplication operator, that possesses its own properties. We use the elements of $\mathbb{K}$ as components of our vectors as seen in the beginning of this discussion. A $n$-dimensional vector space over a field $\mathbb{K}$ is defined as $\mathbb{K}^n$ and the vectors are usually denoted with $\vec{u}$, $\vec{v}$ and so on.
A vector space has two important operations:
- scalar multiplication between an element $k \in \mathbb{K}$ a vector $\vec{v} \in \mathbb{K}^n$. The scalar multiplication $k\vec{v}$ consists in the multiplication of all elements of $\vec{v}$ by $k$, thus $(k\vec{v})_i = k\vec{v}_i$.
- vector addition between two vectors $\vec{u}, \vec{v} \in \mathbb{K}^n$. The vector addition between $\vec{u}$ and $\vec{v}$ is denoted as $\vec{u} + \vec{v}$ and consists in summing the components of the two vectors element-wise, thus $(\vec{u} + \vec{v})_i = \vec{u}_i + \vec{v}_i$.
Both operations return a vector that resides in the vector space. Additionally, it is usually defined another operator between vectors, called scalar/inner/dot product, that returns a scalar. The dot product between two vectors $\vec{u}, \vec{v} \in \mathbb{K}^n$ is defined as $<\vec{u}, \vec{v}> = \sum_i^n u_i \cdot v_i$. In other words, a scalar product consists of multiplying element-wise the two vectors, and summing the components of the resulting multiplications.
Stack of vectors are called matrices. For example:
$$ \mathbb{A} = \begin{pmatrix} 1 & 5 & 3 \ 3 & 4 & 7 \end{pmatrix} $$
is a $2 \times 3$ matrix, with two rows and three columns, and it is composed of two 3d vectors or, alternatively, three 2d vectors. The vectors themselves can be thought as matrices. For example:
$$ \mathbb{u} = \begin{pmatrix} 1 & 5 & 3 \end{pmatrix} $$
is a matrix of size $1 \times 3$, and it is called row vector, while:
$$ \mathbb{v} = \begin{pmatrix} 1 \ 3 \end{pmatrix} $$
is a matrix of size $2 \times 1$, and it is called column vector.
The structure containing the $n \times m$ matrices over a field $\mathbb{K}$ is defined as $\mathbb{K}^{n \times m}$ and matrices (when not dealing with row/columns vectors) are usually defined with $\mathbb{A}$ $\mathbb{B}$, and so on. Perhaps, a counterintuitive finding is that $\mathbb{K}^{n \times m}$, can be thought as a vector space (that is, matrices can be considered as vectors), but, for now, let’s keep it simple.
Like the vectors, two $n \times m$ matrices can be summed together, in an element-wise manner: $(\vec{A} + \vec{B}){ij} = a{ij} + b_{ij}$, reading the element in position $i, j$ of the sum matrix of \vec{A} and \vec{B} is the sum of the elements of the same row and column of \vec{A} and \vec{B}.