Regression is one type of supervised learning, and the other is classification. So, for example, if classification predicts if the weather tomorrow will be hot, cold, rainy, or windy, regression is used to predict the exact temperature. But, of course, the five-day forecast is our smartphone's most apparent. You can learn more about the different machine learning techniques and algorithms here.
Regression is a technique for investigating the relationship between individual variables or features and a dependent variable or outcome.
For example, an e-commerce company runs remarketing campaigns on Christmas products every year. Below is a table they spent on remarketing campaigns for the past five years on Christmas remarketing campaigns.
Since they have been seeing sales go up every time they run a remarketing ad during Christmas, they wanted to predict how much sales they would get if they spent 4000$ in the upcoming Christmas season.
We can use regression to predict sales for the upcoming season. Regression is supervised machine learning, where everything depends on the input data. "Regression shows a line or curve that passes through all the data points on a target-predictor graph so that the vertical distance between the data points and the regression line is minimum."
Some of the areas where regression analysis is most used are:
As mentioned above, regression is essential to some of the critical aspects of our daily life, like weather forecasting, financial analysis, preventing road accidents, and marketing analysis. Likewise, we need regression in machine learning, a statistical method used in data science.
Some of the reasons regression is essential are:
Multiple regression algorithms are used for various use cases in machine learning. Each of them has its importance in different aspects. Still, at their basic level, every one of them is part of regression analyzing the effect of independent variables on dependent variables.
Linear regression is a statistical method used for predictive analysis, a simple algorithm that provides a relationship between the continuous variables. For example, if the input (x) is only one, it is called simple linear regression, and if there are multiple input variables, it is called multiple linear regression.
Linear regression finds the dependent variable (y) based on an independent variable (x), but linear regression works to find the linear relationship between x and y.
The equation of linear regression is as follows.
Y = a + bX
Where y = the dependent variable plotted along the y-axis X = the independent variable plotted along the x-axis The slope of the line is b, and a is the intercept.
Some areas where linear regression is used are:
Logistic regression is also part of supervised machine learning. It is used to predict the probability of an event occurring (yes/no) based on the dependent variable in a binary format. For example, predicting if someone is COVID19 positive or negative.
If the possible outcome is binary, i.e., 0/1, yes/no, it is called Binary Logistic Regression. Still, if the potential result is not binary, meaning there can be more than two possible outcomes, then it is called Multinomial Logistic Regression. And, if the outcome is ordered, for example, detecting the severity of the COVID19 infection, it is called Ordinal Logistic Regression.
The logistic regression equation is as follows, and it uses a Sigmoid function or logistic function to determine the probability of an event occurring. The values above the threshold are rounded up to 1, and the value below is rounded up to 0.
f(x) = 1 / 1 + e-x
f(x) = output between 0 and 1 X = input function E = base of the natural logarithm
Some of the use cases of logistic regression are:
Polynomial regression is used where the data set is non-linear, in cases where linear regression does not give an accurate prediction. So, polynomial regression is applied to variables only if there is a non-linear relationship between the dependent and independent variables.
Polynomial regression is very sensitive to outliers, so much that even the presence of a single outlier can affect its performance. It works by estimating the relationship between the variables as an nth degree polynomial.
The equation for polynomial regression is
y = b0 + b1x1 + b2x12 + … + bnx1n Where y is the output And n is the degree of the polynomial.
Support vector regression works by finding a hyperplane in an nth dimensional space that classifies the data points. The data points on either side closest to the plane are called the support vectors.
Some of the hyperparameters used in SVR are:
Kernel - Takes the data and transforms them into the required form in the higher dimensional space, which is used to find the hyperplane. Some of the most used kernels are linear, non-linear, Sigmoid, polynomial, and RBF (radial basis function).
Hyperplane - Helps predict the continuous variables. The data points on either side of the plane are known as support vectors.
Boundary line - A line that forms around the hyperplane creating a margin for the data points.
Decision tree regression models in the form of a tree structure. It works by breaking down data sets into smaller and smaller chunks while incrementally developing a decision tree. This decision tree contains decision nodes and leaf nodes.
As you can see from the above diagram, the decision tree builds a tree-like structure with each node representing a test for a specified attribute. Each branch represents the test, and each leaf represents the output of the test.
Some of the hyperparameters in the decision tree:
Root Node - It represents the start of the decision tree from where it further gets divided into one or more sets.
The leaf node is the final output beyond which the tree cannot further segregate.
Splitting - the process of splitting the root node or the decision tree into subsets according to a given condition.
Sub-tree - A tree formed by splitting the tree.
Pruning - Removes unwanted branches or trees
Parent/Child Node - The root node is known as the parent node, while everything that branches out from the parent is known as the child node.
Random forest regression combines multiple decision trees to determine the output rather than depending on the dividual decision trees. It is based on an ensemble learning concept, a process of combining multiple classifiers to improve the performance of a model. Therefore, a higher number of decision trees equals higher accuracy of the random forest regression.
Random forest works by selecting K data points from the training set to build the decision trees associated with the data sets. Then we choose the number N for making the decision tree, and for new data points, we find the prediction of each tree and then assign data points to the category that wins the majority of the votes.
“Ridge regression is the method of estimating the coefficients of multiple regression models where independent variables are highly correlated.”
Y = XB + e
The above is the equation for ridge regression, where Y is the dependent variable, X is the independent variable, B is the regression coefficient, and e is the residuals. Ridge regression is used to reduce the complexity of a model, also known as a regularization technique, L2 regularization.
Like ridge regression, lasso regression is also a regularization technique that reduces the model's complexity. The only difference is that the penalty term contains only absolute weight instead of square weights. It is also called L1 regularization.
This article reviewed various industry regression learning algorithms to solve different data science problems. You can read about the entire machine learning training algorithms here, and if you’re looking to deploy machine learning models into production, you can read about deployment here.