Linear Regression Simplified
Linear Regression: Fitting a straight line to a set of observations.
That’s it.
It is the simplest form of Regression Analysis
Example
There is a group of people that I measured, and the 2 features that I measured of these people are their weight and their high level.
Making it clear the weight on the x-axis and the high level on the y-axis.
Plotting the points of people’s Weight against their heights. And I can see that there is a Linear relationship.
How about I fit a straight line to it? I can use that line to predict new values.
You’re creating a line to predict new values based on observations in the past, backwards in time.
Why is it called Regression?
So there’s nothing particularly “regressive” about a regression analysis. It was just a misunderstanding of Francis Galton. For further interesting conspiracy read about its origin read:
blog.minitab.com/en/statistics-and-quality-...
How does Linear Regression Work?
Internally it uses a technique called least squares
And the way it works is it tries to minimize the squared error between each point and the line. The error is just the distance between each point and the line that you have.
The slope just turns out to be the correlation between the two variables times the standard deviation in y, divided by the standard deviation in x.
Just remember least squares minimizes the sum of squared errors from each point to the line. And another way of thinking about Linear Regression is that you’re defining a line that represents the maximum likelihood of an observation line. So again, people sometimes call this maximum likelihood estimation.
If you hear someone talk about maximum likelihood estimation, they’re really talking about regression.
Types of Linear Regression
- Simple Linear Regression: Finding the relationship between a single independent variable (input) and a corresponding dependent variable (output).
- Multiple Linear Regression: Finding the relationship between 2 or more independent variables (inputs) and the corresponding dependent variable (output).
Measuring Efficiency with R-Squared
And r-squared is also known as the coefficient of determination. It is the fraction of the total variation in y that is captured by your model.
The way to interpret r-squared, You’ll get a value that ranges from 0–1.
- 0(Zero) means your fit is terrible it doesn’t capture any of the variance in your data.
- 1(One) is a perfect fit so all of variance in your data gets captured by this line.
So a low r-squared value means it’s a poor fit; high r-squared value means it’s a good fit.
And you can use r-squared as a quantitative measure of how good a given regression is to a set of data points, and then use that to choose the model that best fits your data.
“A gradient measures how much the output of a function changes if you change the inputs a little bit.” — Lex Friedman (MIT)
Hands-on Python Example on Linear Regression
Let’s take an example of some data that shows a roughly linear relationship between page speed and amount purchased:
%matplotlib inline
import numpy as np
from pylab import *
pageSpeeds = np.random.normal(3.0, 1.0, 1000)
purchaseAmount = 100 - (pageSpeeds + np.random.normal(0, 0.1, 1000)) * 3
scatter(pageSpeeds, purchaseAmount)
Output:
So you can see that there’s definitely a linear relationship.
As we only have two features, we can keep it simple and just use scipy.state.linregress:
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(pageSpeeds, purchaseAmount)
Not surprisingly, our R-squared value shows a really good fit:
Let’s use the slope and intercept we got from the regression to plot predicted values vs observed:
import matplotlib.pyplot as plt
def predict(x):
return slope * x + intercept
fitLine = predict(pageSpeeds)
plt.scatter(pageSpeeds, purchaseAmount)
plt.plot(pageSpeeds, fitLine, c='r')
plt.show()
Output:
Thank you for reading this post, I hope you enjoyed and learn something new today. Feel free to contact me through my blog if you have questions, I will be more than happy to help.
Stay safe and Happy learning!