Blog / Machine Learning

ML Model Training: Supervised, Unsupervised & Reinforcement Learning - Part 2

Jul 27, 2022

5 min read

In my last part, I talked about the data collection and preprocessing stage involved in the model training process. In this part, we’ll review the actual training process with three types of learning, supervised, unsupervised, and reinforcement learning. You can check out the entire list of learning algorithms here. We will also go over them with some examples.

Training:

algorithm for machine learning.jpg

Supervised learning:

The machine is fed with well-labeled data, which is tagged on how the output data should be. So we tell the model that we're looking for a particular output through our input data. We provide the input data and the correct result to the machine learning model, then it proceeds to map the input variable with the output variable using a function.

There are two different types of supervised learning, classification and regression.

classification and regression.jpg

In classification, the input data is labeled based on the historical data. If regression is used to predict the percentage of students' scores, classification predicts whether the students have passed or failed.

Classification is used in many areas, for example. In weather forecasting, spam detection, identifying objects in a picture, and determining if a person requires intensive care or not.

Classification is further divided into binary classification and multinomial classification.

Binary classification is used to classify data into yes/no categories. For instance, if a person is suffering from pneumonia or not, If a student has passed or not, or if an email is spam or not.

Multinomial classification is used to classify data into more than two categories: document classification, malware classification, and product classification.

Classification is most used in detecting spam emails. Many email providers use algorithms such as Naive Bayes and Support Vector Machines in spam filtering. The model is fed with legacy data to identify spam vs. non-spam mail and is used to filter future mails based on the said prediction.

Other areas where classification is used are document classification, consumer behavior prediction, image classification, CTR prediction in ads, churn prediction, and BFSI such as credit card fraud detection, anomaly detection, and credit check.

Some of the algorithms used in classification are:

K-Nearest Neighbours
Kernel SVM
Naïve Bayes
Decision Tree Classification
Random Forest Classification
Logistic Regression
Support Vector Machines

Regression learning is used to train with both input features and output labels. It estimates how one variable affects the other by establishing a relationship between them.

For example, you want to predict the best car to buy based on the mileage factor. You can use car miles per gallon and other variables such as bhp, displacement, and weight. You can use regression to plot the average miles per gallon with the features and predict which car model would suit you.

Regression is most used in predicting consumer behavior. For example, let's say you're working in an e-commerce startup that deals with millions of stores and millions of products. How do you predict consumer behavior with items they are likely to buy or what items consumers are likely to buy in different states or countries? E-commerce stores like Amazon and Walmart use Linear Regression (one of the algorithms in regression learning) to predict such behaviors, which enables them to advertise for such products.

Some of the algorithms used in regression are:

Linear Regression
Decision Tree
Support Vector Regression
Lasso Regression
Random Forest

Unsupervised learning:

unsupervised-machine-learning 2.jpeg

Unlike supervised learning, in unsupervised learning, we do not hand hold the algorithm to predict the outcome. Instead, the algorithm finds the hidden pattern and insights from the input data. Also, we may not always be lucky enough to have structured, labeled data all the time—unsupervised learning is used in cases where data labeling and structuring are unavailable.

Let's say in supervised learning, you give the algorithm a set of images to identify the cats from dogs, you also input features along with it and the expected outcome. But, in unsupervised learning, we do not give the algorithm such features or desired results. Instead, the algorithm is given the data with which it identifies the similarities and differences in the dataset.

There are two different types of unsupervised learning, clustering, and association.

unsupervised machine learning diagram

Clustering is the process of grouping similar datasets together to find similarities and differences between the given input. Clustering is helpful in instances where many unlabeled data need to be segmented to understand the underlying meaning, patterns, and behavior.

Let us use the same example of an e-commerce company with millions of stores and products. We need to find consumers based on buying patterns, such as high spenders, low spenders, window shoppers, and average spenders.

We already have set up data collection through various sources, and now we need to segment them into their category so that we can run targeted ads based on their spending behavior. Again, clustering is very helpful in such instances.

Some of the algorithms used in clustering are:

K-means clustering
Density-based spatial clustering (DBSCAN)
Gaussian Mixture Model
Balance Iterative Reducing and Clustering (BIRC)
Affinity Propagation clustering
Mean-Shift clustering
Ordering Points To Identify the Clustering Structure (OPTICS)
Agglomerative Hierarchy clustering

Association works based on dependency, and it works to identify the relationship between data points in a dataset to map them accordingly.

For example, when we shop on Amazon, we are shown recommendations such as 'customers also bought’. Those predictions based on your buying pattern can be achieved through association.

Again, let's go with the same e-commerce store we have used throughout the article. Let's say you want to identify people with similar buying patterns. For example, people who bought phones might look for accessories, and people who bought shoes might look for socks. We would be able to do so with association learning.

Some of the algorithms used in association are:

Apriori
Eclat
F-P

Reinforcement Learning:

It is a feedback-based learning process in which an agent (algorithm) learns to detect the environment and the hurdles to see the results of the action. The agent is given positive feedback for the right action and negative feedback for the wrong action—kind of like teaching the algorithm how to play a game.

In reinforcement learning, there is no labeled data available, no expected output is given. The algorithm has to learn with the given dataset, and determine the actions taken by the feedback given. Like in robotics, where the machine is allowed to take actions and it learns based on feedback.

There are two types of reinforcement learning, positive and negative reinforcement.

reinforcement learning process.png

Positive reinforcement rewards the agent when it takes the right action, for instance, from the above image, when the agent moves towards the S5 block. The agent learns that moving to S5 is a rewarding move.

Negative reinforcement is to give negative feedback when the agent makes a mistake. For example, moving to S8, we can assign a negative score by which the agent learns from its mistakes. Thus avoids moving to S8 in the future and stores that action in its memory.

reinforcement learning.png

Another example is the self-driving cars, which are becoming increasingly sophisticated and work the same way as the maze example. However, we reinforce the agent, the car, to drive with a set of rules in the given environment, which is the road. You can read about this in more detail here.

Some of the algorithms used in reinforcement learning are:

Q-learning
State-Action-Reward-State-Action (SARSA)
Deep Q-network
Trust Region Policy Optimization (TRPO)
Proximal Policy Optimization (PPO)
Twin Delayed Deep Deterministic Policy Gradient (TD3)
Soft Actor-Critic (SAC)

That concludes the training part of the model training. Then, depending on the problem you’re solving, you can choose the method and algorithm that will help you solve the problem.

Our job does not stop with training or deploring, and it’s more than that. If we take a holistic approach, our job does not end; it is a reiterative process.

In the next blog, I will take you through retraining, best practices, and tools, and we’ll conclude this series with some real-life examples.

Written By

Thinesh SridharTechnical Content Writer