Key focus: Let’s demonstrate basics of univariate linear regression using Python SciPy functions. Train the model and use it for predictions.
Linear regression model
Regression is a framework for fitting models to data. At a fundamental level, a linear regression model assumes linear relationship between input variables (
If there are only one input variable and one output variable in the given dataset, this is the simplest configuration for coming up with a regression model and the regression is termed as univariate regression. Multivariate regression extends the concept to include more than one independent variables and/or dependent variables.
Univariate regression example
Let us start by considering the following example of a fictitious dataset. To begin we construct the fictitious dataset by our selves and use it to understand the problem of linear regression which is a supervised machine learning technique. Let’s consider linear looking randomly generated data samples.
import numpy as np
import matplotlib.pyplot as plt #for plotting
np.random.seed(0) #to generate predictable random numbers
m = 100 #number of samples
x = np.random.rand(m,1) #uniformly distributed random numbers
theta_0 = 50 #intercept
theta_1 = 35 #coefficient
noise_sigma = 3
noise = noise_sigma*np.random.randn(m,1) #gaussian random noise
y = theta_0 + theta_1*x + noise #noise added target
plt.ion() #interactive plot on
fig,ax = plt.subplots(nrows=1,ncols=1)
plt.plot(x,y,'.',label='training data')
plt.xlabel(r'Feature $x_1$');plt.ylabel(r'Target $y$')
plt.title('Feature vs. Target')
In this example, the data samples represent the feature
Linear regression
Let
In the univariate linear regression problem, we seek to approximate the target
where,
Using all the
If we represent the variables
It may seem that the solution for finding
However, matrix inversion is not defined for matrices that are not square. Moore-Penrose pseudo inverse generalizes the concept of matrix inversion to a
For coding in Python, we utilize the scipy.linalg.pinv function to compute Moore-Penrose pseudo inverse and estimate
xMat = np.c_[ np.ones([len(x),1]), x ] #form x matrix
from scipy.linalg import pinv
theta_estimate = pinv(xMat).dot(y)
print(f'theta_0 estimate: {theta_estimate[0]}')
print(f'theta_1 estimate: {theta_estimate[1]}')
The code results in the following estimates for
>> theta_0 estimate: [50.66645323]
>> theta_1 estimate: [34.81080506]
Now, we know the parameters
x_new = np.array([[-0.2],[0.5],[1.2] ]) #new unseen inputs
x_newmat = np.c_[ np.ones([len(x_new),1]), x_new ] #form xNew matrix
y_predict = np.dot(x_newmat,theta_estimate)
>>> y_predict #predicted y values for new inputs for x_1
array([[43.70429222],
[68.07185576],
[92.43941931]])
The approximated target as a linear function of feature, is plotted as a straight line.
plt.plot(x_new,y_predict,'-',label='prediction')
plt.text(0.7, 55, r'Intercept $\theta_0$ = %0.2f'%theta_estimate[0])
plt.text(0.7, 50, r'Coefficient $\theta_1$ = %0.2f'%theta_estimate[1])
plt.text(0.5, 45, r'y= $\theta_0+ \theta_1 x_1$ = %0.2f + %0.2f $x_1$'%(theta_estimate[0],theta_estimate[1]))
plt.legend() #plot legend
Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.
References
Related topics
[1] Introduction to Signal Processing for Machine Learning |
[2] Generating simulated dataset for regression problems - sklearn make_regression |
[3] Hands-on: Basics of linear regression |
Books by the author
Wireless Communication Systems in Matlab Second Edition(PDF) Note: There is a rating embedded within this post, please visit this post to rate it. | Digital Modulations using Python (PDF ebook) Note: There is a rating embedded within this post, please visit this post to rate it. | Digital Modulations using Matlab (PDF ebook) Note: There is a rating embedded within this post, please visit this post to rate it. |
Hand-picked Best books on Communication Engineering Best books on Signal Processing |