What is Regression Analysis in Machine Learning? Explained Simply


       
      

Regression in ML


Regression analysis is a statistical method for modeling the relationship between a dependent (target) and independent (predictor) variable by employing one or more independent variables. More specifically, regression analysis enables us to comprehend how the value of the dependent variable changes in relation to an independent variable when other independent variables are kept constant. It can predict temperature, age, salary, price, and other real and continuous values. We can better understand the concept of regression analysis by using the following illustration: Take marketing company A, for instance, which runs a variety of advertisements each year and makes money from those sales. 

Example

The following list lists the company's commercials and comparing deals over the past five years:

     
What is Regression Analysis in Machine Learning? Explained Simply
Regression Analysis Example


Presently, the business is interested in estimating its sales for 2019 and plans to run a $200 advertisement. Regression analysis is therefore required in order to address these kinds of prediction issues in machine learning.


A supervised learning method called regression allows us to forecast a continuous output variable based on one or more predictor variables and assists in determining the correlation between variables. Its main applications are in prediction, forecasting, time series modeling, and determining the cause-and-effect relationship between variables.

Regression involves creating a graph between the variables that most closely matches the provided datapoints. The machine learning model can then forecast the data using this plot. To put it simply, "Regression depicts a line or curve that passes through all of the datapoints on the target-predictor graph in a way that minimizes the vertical distance between the datapoints and it."
 One can determine if a model has captured a strong relationship or not by measuring the distance between datapoints and the line.

Here are a few examples of regression:


Rainfall forecasting based on temperature and additional variables
Identifying Trends in the Market
forecasting of crashes caused by irresponsible driving.


Why do we use Regression Analysis?

Regression analysis can be used to predict a continuous variable. In the real world, there are many situations where we need to forecast future events, such as weather patterns, sales figures, marketing trends, etc. In these situations, we need technology that can forecast events more precisely. Regression analysis, a statistical technique used in data science and machine learning, is therefore necessary in this situation.


Terminologies Related to the Regression Analysis in Machine Learning


Terminologies Used in Regression Analysis:

The response variable, sometimes referred to as the dependent or target variable, is the main variable in a regression that needs to be understood or predicted.

Predictor variables, also known as independent variables, are variables that have an impact on the response variable and are used to forecast its values.

Outliers: Observations that differ greatly from other observations in terms of their values, potentially affecting the outcome and therefore best avoided.

High correlation between independent variables, or multicollinearity, can make it more difficult to rank important variables.

Underfitting and Overfitting: Underfitting denotes subpar performance across both datasets, whereas Overfitting happens when an algorithm performs well during training but poorly during testing.


Types of Regression

Linear Regression

Linear regression is among the most fundamental and frequently used statistical models. This presupposes that the independent and dependent variables have a linear relationship. This indicates that there is a proportionality between the change in the independent variables and the change in the dependent variable.


Logistic regression

Logistic regression is supervised learning algorithm used to solve classification problems. In classification problems, our dependent variables are binary or discrete, such as 0 or 1.
The logistic regression algorithm makes use of categorical variables, such as 0 or 1, Yes or No, etc.

Although they are both types of regression, logistic regression and linear regression algorithms are applied differently. 
In logistic regression, the complex cost function also referred to as the sigmoid or logistic function is employed. In logistic regression, the data is modeled using this sigmoid function.



Polynomial regression

Modeling nonlinear relationships between the dependent variable and the independent factors is done using polynomial regression. To capture more intricate relationships, polynomial terms are added to the linear regression model.


Support Vector Machine

Support Vector Machine is an technique for supervised learning that can be applied to both classification and regression issues.

SVM is a kind of algorithm that can be applied to both regression and classification tasks. To find a hyperplane that minimizes the sum of squared residuals between the actual and predicted values, SVR uses a regression analysis.


Decision Tree Algorithm

A supervised learning algorithm that can be used to both classification and regression problems is the decision tree algorithm.

A decision tree is a structure with nodes and branches that resembles a tree. A decision is represented by each node, and its result is represented by each branch. Creating a tree that can precisely predict the target value for new data points is the aim of decision tree regression.


Random forest regression

Random forest regression is a complex method that uses several decision trees to predict a desired value. A kind of machine learning algorithm called an ensemble method combines several models to enhance the performance of the model as a whole.

 In order for random forest regression to function, many decision trees must be constructed and trained using various subsets of the training set. The average of each tree's predictions yields the final forecast.


Ridge regression

Ridge regression is a highly dependable variant of linear regression that enhances long-term predictions by incorporating a minor degree of bias.
The Ridge Regression penalty is the amount of bias incorporated into the model. This penalty term can be calculated by multiplying the squared weight of each individual feature by the lambda.


Lasso regression

Lasso regression is another regularization technique that reduces the complexity of the model.
It is similar to the Ridge Regression except that the penalty term consists only of absolute weights instead of a square of weights.



Applications of Regression


Pricing prediction: A regression model, for instance, could be used to forecast a house's price depending on its dimensions, location, and other characteristics.

Trend forecasting: For instance, a regression model based on past sales data and economic indicators could be used to project a product's sales.

Finding risk factors: Using patient data, a regression model, for instance, could be used to find heart disease risk factors.

Making decisions: Regression models, for instance, can be used to suggest investments based on market data.