From Vectors to Predictions: Linear Algebra in AI & ML

The machine learning revolution will not be televised: it will be vectorized.- Pete Warden

You ask, the AI answers. It's a dialogue as natural as any conversation. One moment you're asking a complex question, the next, a perfectly crafted response appears. It feels like magic, doesn't it? But behind that seamless interaction lies a hidden world of intricate calculations, a realm where Linear Algebra reigns supreme. A language of vectors, matrices, and transformations; that's what is Linear Algebra all about. It's the silent architect, the unseen force that shapes the AI responses we've come to rely on.

In this article, we'll decode this hidden language, using a classic problem – estimating house prices in New Delhi – to reveal how Linear Algebra empowers AI to understand and respond to our world. Imagine possessing the power to predict the very pulse of the real estate market, to see the hidden patterns that dictate a home's true value. That's the power Linear Algebra unlocks.

We will be using this sample housing prices dataset which is by no means a valid dataset by the scale of data used in machine learning, but it’s good to start small.

Carpet Area (sq ft)	Bedrooms	Locality	Age (years)	Price (lakhs)
800	2	0	10	15
1200	3	1	5	28
1500	3	1	3	32
1000	2	0	8	20
1800	4	1	2	38
2000	4	1	1	42
900	2	0	12	16
1300	3	1	6	30
1600	3	1	4	35
1100	2	0	9	23

Let's start with a simplified scenario: predicting house prices based only on carpet area. In high school geometry, we learned that a straight line can be represented by the equation:

$$wx + b = y$$

Here

y is the y-coordinate (price)

x is the x-coordinate (carpet area)

w is the slope, and

b is the y-intercept.

Using this equation, we can try to predict house prices. Let's create a scatter plot of the data with price on the y-axis and carpet area on the x-axis.

Our goal is to draw a straight line that best fits most of these data points and in doing so eventually we find the values for w and b which satisfies this trend line. This is precisely what machine learning models do: they find a line that best represents the data trend. Once we get this trend line, we can use any carpet area on X axis and find its corresponding price on Y axis. Lets try predicting the price for 1400 sq ft which is not in out dataset, the price predicted by this trend line is approx 29.73 lacs.

This was a simplified example. However, house prices are influenced by more than just carpet area. Factors like the number of bedrooms, locality, and age also play a vital role. Let's update our problem by including these additional features. Also, one note about the locality column in the dataset which is just 0 or 1; this is a symbolic value which has been used to replace the actual values of this column which were categorical in nature; the reason being machine learning models love numbers more than texts!

Introducing multiple features and linear algebra

Once we have understood the core idea of estimating housing prices and expanded our features which influence our estimation, we now face a new challenge, how do we incorporate these additional features in our prediction model and how to write it mathematically? The good thing is that the fundamental principle of line equation $wx+b = y$ still holds. We just need to adjust it to accommodate these new features. Instead of a line in 2D space, we will be dealing with a hyperplane in a multi dimensional space, something which we can not draw on a paper or even visualize how such a hyperplane will look like, and this is where maths helps us.

We can extend our line equation to include these new features using below equations:

$$\begin{matrix} w_1x_{11} + w_2x_{12} + w_3x_{13} + b_1 = y_1 \\ w_1x_{21} + w_2x_{22} + w_2x_{23} + b_2 = y_2 \\ ... \\ ... \\ ...\end{matrix}$$

Here,

w represents the weights

x represents the features

b is the bias term and

y is the price, also called as labels

If we closely look at the equations, we can say that we now have a system of linear equations where x and y are known, and w and b are unknowns. Solving these equations gives us the parameters w and b, using which we can predict prices for new set of features.

However, solving these equations directly can be not so wise idea once we are doing some real machine learning stuff because we will be dealing with millions of equations.. This is where vectors and matrices, the foundational elements of Linear Algebra, simplify things for us.

Vectors and Matrices

Let’s take a little pause from our system of linear equations, and first try to understand what is a vector.

Although we can define a vector in multiple ways like a point in space or in the language of physics we can define it as a unit having both quantity and direction, but in Linear Algebra world a vector is generally considered a line in space starting from origin to the given point . For example, a point (1, 2) can be represented as a vector $\vec{v} = \begin{bmatrix} 2 \\ 3 \end{bmatrix}$ in the 2 dimensional XY plane which conveys that if we move 2 points in x direction and 3 points in y direction we wiil reach to this point (2, 3). These points are not meaningless, the numbers contained by a vector have some real life meaning, for example in our housing example, we can represent the features of a house as a feature vector X = (carpet area, bedrooms, locality, age) and the prices as a vector Y = (price1, price2, ...) so based on this schema if I write a vector X = (800, 2, 1, 10) then we know what each number in this vector stands for. This is how a vector helps us to declutter the information and focus on the numbers, the numbers which we will operate on soon.

Linear Algebra allows us to combine two vectors using operations like vector addition and scalar multiplication and produce another vector. For example, a linear combination of vectors $\vec u$ and $\vec v$ can be expressed as:

$c_1$ $\vec u$ + $c_2$ $\vec v$ = $\vec w$

Where $c_1$ and $c_2$ are some scalars, and $\vec w$ is the resulting vector.

To understand this algebra of vectors we can take a simple example in XY plane with two vectors u = (2, 3) and v = (1, -2).

When we multiply the vector $\vec{u}$ by some scalar $c_1 = 1$, it means we are just scaling the vector $\vec{u}$ by 1 time which essentially means the vector remains unchanged. When we multiply another scalar $c_2 = 2$ with the vector $\vec{v}$, it is scaled by 2 times.

When we add these two scaled vectors we are basically adding their respective components and the final result of these two algebraic operations is a new vector $\vec{w} = (4, -1)$

$$1 \begin{bmatrix} 2 \\ 3 \end{bmatrix} + 2 \begin{bmatrix} 1 \\ -2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}$$

$$\begin{bmatrix} 2 \\ 3 \end{bmatrix} + \begin{bmatrix} 2 \\ -4 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}$$

$$\begin{bmatrix}4 \\ -1 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}$$

This elementary operations tell us something fundamental about the world of vectors i.e. Linear Combinations. We can combine two vectors to get a new vector.

So far we looked into vectors which are 1 row or 1 column of some numbers. But when we stack multiple vectors together we get a Matrix.

In our housing example, we can represent the entire dataset as a matrix where each row is a feature vector for a particular house. When we study Linear Algebra we get to know that we can do operations in a matrix either in a row format or column format. My preferred way to study a matrix is in the column form because it allows us to treat these columns as vectors and we can combine these vectors to get another vector.

Using vectors and matrices, we can rewrite our system of linear equations in a more structured form:

$$Xw + b = y$$

Where w is the weight vector, X is the feature matrix, b is the bias term vector, and y is the price vector.

So now when we look at these linear equations again after reading about vectors and matrices, we can think about this from Linear Algebra point. We can try to write it in mathematical form as

$$Xw = y$$

We ignore the bias term b for now to simplify the process but in real life don’t be biased!

$$\begin{bmatrix} x_{11} && x_{12} && x_{13} && x_{14} \\ x_{21} && x_{22} && x_{23} && x_{24} \\ ... && ... && ... && ... \\ ... && ... && ... && ... \\ x_{m1} && x_{m2} && x_{m3} && x_{m4} \\ \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ w_4\end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ ... \\ ... \\ y_m \end{bmatrix}$$

Here X is called the coefficients matrix with all the features written in matrix format and w is the column vector of all weight parameters and y is the prices shown as a column vector.

Now, using the concepts which we just learned about Linear Combination of vectors, we are now in a better position to deal with multiple set of linear equations. We can treat this as a problem of linear combination where we are multiplying a matrix X with a vector w and trying to get a vector y. We are basically trying to find some good numbers in w vector which when multiplied with each column vector of matrix X and then added together gives us the vector y.

An average Machine learning problem involves millions of data to be analyzed to train any reasonable performing model and solving such a mammoth system of linear equations seems like beyond our mathematical prowess. When we dealt with 2D or 3D space, it was easier to visualize the vectors geometrically but in machine learning world we often deal with n dimensional vectors which we can't even imagine or visualize geometrically. Linear Algebra allows us to deal with these multi dimensional spaces with the help of vectors and matrices and make sense of the data.

Finally, we can write the actual numbers from our reference housing dataset and solve the equations using the above matrix format Xw = y to find a good w.

Here we have 10 rows of data and 4 unknowns so it might still be possible to do some hands on matrix calculations but in reality we will be using some standard libraries for machine learning and Scikit Learn is the one I have used in this demo code

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

df = pd.read_csv("housing_data.csv")

lin_reg = LinearRegression()

lin_reg = LinearRegression()
X = df.drop("Price", axis=1) # Matrix X of all the features
y = df["Price"].copy() # vector y of Prices

type(X)

# train the model
lin_reg.fit(X, y)

# print the parameters 
print("Coefficient", lin_reg.coef_) # this outputs Coefficient [ 0.01875    -1.29605263  4.43421053 -0.22368421]

#prediction
predictions = lin_reg.predict(X)
print(predictions[0]) # 15.631578947368407

In the above code, when we train the model with lin_reg.fit(X,y) we know what must be going behind the scenes. The library must be solving the matrix vector multiplication and trying to find the vector $\vec w$. Technically, the LinearRegression class from SciKit Learn library uses concepts like Singular Value Decomposition and Pesudoinverse from linear algebra to solve the equations and it is generally known as closed form equation. There is another popular method called Gradient Descent which I will talk about in future blogs.

Once the training is completed we can print the weight coefficients it calculated using lin_reg.coef_ which prints $\begin{bmatrix} 0.01875 \\ -1.29605263 \\ 4.43421053 \\ -0.22368421 \end{bmatrix}$

So now if we want to predict housing prices for any new set of features we just multiply each feature with corresponding weights from the above vector and add them, that’s our predicted housing price.

Conclusion

As demonstrated, Linear Algebra provides an elegant framework for representing and manipulating multi-dimensional data, even in a simple housing prediction problem. This framework is crucial for tackling more complex problems in AI, such as image and text processing.

We started this journey by asking how an AI model swiftly answers our questions. Now, we've seen how linear algebra provides the essential tools for machine learning algorithms to learn from data and make predictions.

When we ask an AI model a question, it often involves complex processes like natural language processing, which relies heavily on linear algebra. The model might represent words and sentences as vectors, perform matrix operations to understand relationships, and use linear algebra techniques to generate responses.

Even in our simplified housing price prediction example, we saw how linear algebra allows us to represent data, build models, and make predictions. In more complex AI tasks, the principles remain the same, just on a much larger scale.

Essentially, when an AI model 'thinks' and generates an answer, it's performing a series of mathematical operations, and linear algebra is the backbone of those operations. So, the next time you get a quick and insightful answer from your favorite AI, remember the power of vectors, matrices, and linear equations working behind the scenes.

As AI continues to evolve, understanding linear algebra becomes increasingly crucial for anyone looking to delve deeper into this fascinating field. It's not just a set of mathematical tools; it's the language that enables machines to learn, understand, and interact with the world around them.

From Vectors to Predictions: Linear Algebra in AI & ML

Shashank Rajak

Apr 6, 2025

11 min read

Introducing multiple features and linear algebra

Vectors and Matrices

Conclusion

Thoughts or questions?

Follow me on Medium