# 10 Machine Learning Algorithms And Their Application

Algorithms are the smart and powerful soldier of a complex machine learning model. In other words, machine learning algorithms are the core foundation when we play with data or when it’s come to training the model.

In this article, you and I are going on a tour called ”7 major machine learning algorithms and their application ”

The purpose of this tour is to either brush up the mind or to gain an essential understanding of machine learning algorithm.

We will find the major answer in this tour like for what purpose machine learning algorithms works, where to use them, when to use them and how to use them.

Before getting deeper let’s have a brief introduction. Machine learning algorithms are mainly classified into 3 broad categories i.e supervised learning, unsupervised learning, and reinforcement learning.

In supervised learning machine learning algorithms, the machine is taught by example. Here the operator provides the machine learning algorithm with the dataset. This dataset includes desired inputs and outputs variables.

By the use of these set of variables, we generate a function that map inputs to desired outputs. After that the machine learning algorithm starts to find a method to determine how to arrive at those inputs and outputs, the operator knows the correct answers to the problem.

**Get Free ML Book called Machine Learning Yearning by Andrew Ng**

The algorithm recognizes the patterns in data, learn from observations and finally makes predictions. The predictions made by the algorithm is corrected by the operator and this process continues until the algorithm achieves the desired level of accuracy/performance on the training data.

The supervised learning is mostly found useful when a property or label is available for a certain dataset.

Further supervised learning is classified into Classification, Regression, and Forecasting. Machine learning algorithms like Decision Tree, Random Forest, KNN, Logistic Regression etc are the type of supervised learning.

If we talk about unsupervised learning, then in this type of machine learning algorithms the machine study data to identify patterns. In unsupervised learning, there are only input variables (X) but no corresponding output variables.

Here the machine learning algorithm interprets large data sets and tries to organize that data in some way to describe its structure, this might mean grouping the data into clusters or arranging it in a way that looks more organized.

Unsupervised learning machine learning algorithms are used unlabeled training data to model the underlying structure of the data.

Machine learning algorithms belong to unsupervised learning such as Apriori algorithm and K-mean are very useful in cases where the challenge is to discover implicit relationships in a given unlabeled dataset i.e where the items are not pre-assigned.

For example clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention.

Now the last category of machine learning algorithm is reinforcement learning, the reinforcement learning is a type of machine learning algorithm that helps to decide the best next action based on its current state, from learning behaviors that will maximize the reward.

The reinforcement learning focuses on regimented learning processes, where a machine learning algorithm is provided with a set of actions, parameters and end values. After defining the rules the machine learning algorithm then tries to explore different options and possibilities, monitoring and evaluating each result to determine which one is optimal.

The reinforcement learning learns from past experiences and begins to adapt its approach in response to the situation to achieve the best possible result, in other words, it teaches the machine through trial and error.

**Five AI movies to watch in 2019 **

In reinforcement learning for each predictive step or action, there is some form of feedback available but there is no precise label or error message in reinforcement learning.

Reinforcement algorithms are usually used in robotics where a robot can learn to avoid collisions by receiving negative feedback after bumping into obstacles, and in video games where trial and error reveals specific movements that can shoot up a player’s rewards.

Now after getting a general introduction to the types of machine learning algorithms, let us dive into our ten major machine learning algorithm and eat them out.

**1. Naive Bayes**

Navie Bayes is a machine learning algorithm that is particularly based on Bayes’ theorem with an assumption of independence between predictors. It is one of a simple machine learning algorithm that bring lots of powerful on the table and it is also best suited for predictive modeling.

Naive Bayes is called naive because it assumes that each input variable is independent. This is a strong assumption and unrealistic for real data, nevertheless, the technique is very effective on a large range of complex problems.

Bayes theorem is a way to find out the conditional probability, the conditional probability is a probability of an event happening given that it has some relationship to one more other events.

For example, your probability of getting a parking space is connected to the time of the day you parked, where you park and what conventions are you going take on that time.

By the use of this machine learning algorithms, we will be dealing with the probability distributions of the variables in the dataset, predicting the probability of the response variable belonging to a particular value, given the attributes of a new instance.

In Naive Bayes, the classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

Its model is comprised of two types of probabilities that can be calculated directly from your training data:

1) The probability of each class; and 2) The conditional probability for each class given each x value.

In machine learning, we many times select the best hypothesis (c) given data (x). In a classification problem, our hypothesis (c) may be the class to assign for a new data instance (x). Bayes’ Theorem provides a way that we can calculate the probability of a hypothesis given our prior knowledge.

Bayes’ Theorem is stated as:

where:

1. P(c|x) = This is called the posterior probability. The probability of hypothesis h being true, given the data d, where P(c|x)= P(x1| c) P(x2| c)….P(xn| c) P(d)

2. P(x|c) = This is called the Likelihood. The probability of data d given that the hypothesis h was true.

3. P(c) = This is called the Class prior probability. The probability of hypothesis h being true (irrespective of the data)

4. P(x) = This is is called the Predictor prior probability. The probability of the data (irrespective of the hypothesis)

After calculating the posterior probability for a number of different hypotheses, you can select the hypothesis with the highest probability. This is the maximum probable hypothesis and may formally be called the maximum a posteriori (MAP) hypothesis.

The calculated probability model can also be used to make predictions for new data using Bayes Theorem. When your data is real-valued it is common to assume a Gaussian distribution (bell curve) so that you can easily estimate these probabilities.

The complexity of the above Bayesian classifier needs to be reduced, for it to be practical. The Naive Bayes algorithm does that by making an assumption of conditional independence over the training dataset. This drastically reduces the complexity of the problem.

Model-based on Naive Bayesian machine learning algorithms are easy to build and particularly useful for very large data sets and very effective on a large range of complex problems. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

This kind of machine learning algorithms has lots of different application such as categorizing news, email spam detection, face recognition, sentiment analysis, medical diagnosis, digit recognition, and weather prediction.

If you want to explore more about Naive Bayes then here is an amazing detail oriented article “Naive Bayesian Model” from Abhay Kumar our lead Data Scientist.

**2. Decision Tree **

Decision Tree is one of the most known machines learning algorithms. It is a tree-like flow-chart structure that is used to visually and explicitly represent decisions and to illustrate every possible outcome of a decision.

It is a graphical representation of possible solutions to a decision based on certain conditions. It’s called a decision tree because it starts with a single box (or root), which then branches off into a number of solutions, just like a tree.

The tree can be explained by two entities, namely decision nodes and leaves. The leaves are the decisions or final outcomes and each node within the tree represents a test on a specific variable. And the decision nodes are where the data is split.

The Decision Tree algorithm is s a supervised learning algorithm that works for both categorical and continuous dependent variables.

A decision tree is drawn upside down with its root at the top. In the image on the left, the bold text in black represents a condition/internal node, based on which the tree splits into branches/ edges.

The end of the branch that doesn’t split anymore is the decision/leaf, in this case, whether the passenger died or survived, represented as red and green text respectively.

The Decision Tree machine learning algorithm is a type of supervised learning algorithm that is mostly used for classification problems.

There are two main types of Decision Trees, Classification trees (Yes/No types), Regression trees (Continuous data types).

Tree models where the target variable can take a discrete set of values are called classification trees, in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.

Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees.

The assumptions we make while using these machine learning algorithms are that at the beginning, the whole training set is considered as the root. Feature values are preferred to be categorical. If the values are continuous then they are discretized prior to building the model.

Records are distributed recursively on the basis of attribute values. Order to placing attributes as root or internal node of the tree is done by using some statistical approach.

If we talk about the application of Decision trees then there is numerous area where we use decision trees such as predicting and reducing customer churn across many industries, fraud detection in the insurance sector, credit risk scoring in the banking and financial services.

**3.Linear Regression**

Linear regression is one of the most known machines learning algorithms and it is a very simple approach to supervised learning.

Linear regression is the most basic type of regression. It was developed in the field of statistics and is studied as a model for understanding the relationship between input and output numerical variables, but has been borrowed by machine learning.

Linear regression is a linear model, example a model that assumes a linear relationship between the input variables (x) and the single output variable (y). More specifically, that y can be calculated from a linear combination of the input variables (x).

In machine learning, we have a set of input variables (x) which are used to determine the output variable (y). A relationship exists between the input variables and the output variable. The goal of ML is to quantify this relationship.

Whenever there is a single input variable (x), the method is referred to as simple linear regression. When there are multiple input variables, literature from statistics often refers to the method as multiple linear regression.

For understanding the working functionality of linear regression, let’s imagine how you would arrange random logs of wood in increasing order of their weight.

There is a catch, however – you cannot actually weigh each log. You have to guess its weight just by looking at the height and girth of the log (visual analysis) and arrange them using a combination of these visible parameters. This is what linear regression is like.

Mathematically, we can write a linear relationship as:

**Where:**

*1) y *is the response

*2) β* values are called the **model coefficients**. These values are “learned” during the model fitting/training step.

*3) β0* is the intercept

*4) β1* is the coefficient for *X1* (the first feature)

*5) βn* is the coefficient for *Xn *(the nth feature)

There are different techniques that we can use to learn the linear regression model from data, such as a linear algebra solution for ordinary least squares and gradient descent optimization.

Linear regression has been around for more than 200 years and has been extensively studied.

Some good rules of thumb when using this technique are to remove variables that are very similar (correlated) and to remove noise from your data, if possible. It is a fast and simple technique and good first algorithm to try.

If you want to know more about linear regression in detail you can head over to Jason Brownlee articles called “Linear Regression for Machine Learning”. Jason Brownlee, Ph.D. is a machine learning specialist.

**4.Logistic Regression**

Logistic Regression is one of the best machine learning algorithms for binary classification problems. It is mainly focused on calculating the probability of an event occurring based on the previous data provided

These machine learning algorithms are a statistical method used to estimate discrete values from a set of independent variables. It helps to predict the probability of an event by fitting data to a logit function and allows one to say that the presence of a risk factor increases the probability of a given outcome by a specific percentage.

In logistic regression, the output is in the form of probabilities of the default class (unlike linear regression, where the output is directly produced). As it is a probability, the output lies in the range of 0-1.

The output (y-value) is generated by log-transforming the x-value, using the logistic function h(x)= 1/ (1 + e^ -x) . A threshold is then applied to force this probability into a binary classification.

The logistic regression model computes a weighted sum of the input variables similar to the linear regression, but it runs the result through a special non-linear function, the logistic function or sigmoid function to produce the output y. Here, the output is binary or in the form of 0/1 or -1/1.

**The sigmoid/logistic function is given by the following equation: y = 1 / 1+ e ^{-x}**

As you can see in the graph, it is an S-shaped curve that gets closer to 1 as the value of input variable increases above 0 and gets closer to 0 as the input variable decreases below 0. The output of the sigmoid function is 0.5 when the input variable is 0.

Thus, if the output is more than 0.5, we can classify the outcome as 1 (or positive) and if it is less than 0.5, we can classify it as 0 (or negative).

The goal of logistic regression is to use the training data to find the values of coefficients such that it will minimize the error between the predicted outcome and the actual outcome. These coefficients are estimated using the technique of Maximum Likelihood Estimation.

Maximum Likelihood Estimation is a general approach to estimating parameters in statistical models. You can maximize the likelihood using different methods like an optimization algorithm.

Newton’s Method is such an algorithm and can be used to find the maximum (or minimum) of many different functions, including the likelihood function. Instead of Newton’s Method, you could also use Gradient Descent.

Example: In predicting whether an event will occur or not, the event that it occurs is classified as 1. In predicting whether a person will be sick or not, the sick instances are denoted as 1). It is named after the transformation function used in it, called the logistic function h(x)= 1/ (1 + ex), which is an S-shaped curve.

In general, this machine learning algorithms can be used in real-world applications such as credit scoring, measuring the success rates of marketing campaigns, predicting the revenues of a certain product.

**5. K-Nearest Neighbors**

The KNN is a very simple and very effective machine learning algorithms. It is a non-parametric, lazy-learning algorithm, which means that there is no explicit training phase before classification.

The purpose behind its use is to use a database in which the data points are separated into several classes to predict the classification of a new sample point. The k-nearest neighbor’s algorithm uses the entire dataset as the training set, rather than splitting the dataset into a training set and test set.

KNN can require a lot of memory or space to store all of the data, but only performs a calculation (or learn) when a prediction is needed, just in time. You can also update and curate your training instances over time to keep predictions accurate.

In this above example, the K-Nearest Neighbor process dictates the new data point to belong in the red category

The K-Nearest-Neighbour algorithm estimates how likely a data point is to be a member of one group or another. It essentially looks at the data points around a single data point to determine what group it is actually in.

For example, if one point is on a grid and the algorithm is trying to determine what group that data point is in (Group A or Group B, for example) it would look at the data points near it to see what group the majority of the points are in.

In KNN machine learning algorithm the predictions are made for a new data set by searching through the entire training set for the K most similar instances, the neighbors and summarizing the output variable for those K instances.

There is a various application of KNN algorithms, K-NN is often used in search applications where you are looking for similar items; that is when your task is some form of find items similar to this one. You’d call this a k-NN search.

**6. Learning Vector Quantization**

In computer science, learning vector quantization (LVQ), is a supervised neural network that uses a competitive (winner-take-all) learning strategy.

It is related to other supervised neural networks such as the Perceptron and the Back-propagation algorithm. LVQ algorithm is an artificial neural network algorithm that allows you to choose how many training instances to hang onto and learns exactly what those instances should look like

It is also related to other competitive learning neural networks such as the Self-Organizing Map algorithm that is a similar algorithm for unsupervised learning with the addition of connections between the neurons.

Additionally, LVQ is a baseline technique that was defined with a few variants LVQ1, LVQ2, LVQ2.1, LVQ3, OLVQ1, and OLVQ3 as well as many third-party extensions and refinements too numerous to list.

The representation for LVQ is a collection of codebook vectors. These are selected randomly in the beginning and adapted to best summarize the training dataset over a number of iterations of the learning algorithm.

After learned, the codebook vectors can be used to make predictions. The most similar neighbor (best matching codebook vector) is found by calculating the distance between each codebook vector and the new data instance.

The class value or real value in the case of regression for the best matching unit is then returned as the prediction. Best results are achieved if you rescale your data to have the same range, such as between 0 and 1.

There are various application of learning vector quantization such as localization of myocardial infarction, fault diagnosis of the power transformer and for the classification of breast lesions.

**Example: **

Suppose there are three classes { red, blue and green}. The applet animation below shows how an LVQ with two neurons per color, is able to adjust the weight vectors of its neurons so that they become a typical red, blue and green *reference* or codebook vectors. As in the previous example, the input vector * x* has only two elements, which can then be shown on a 2D plot.

If you discover that KNN gives good results on your dataset try using LVQ to reduce the memory requirements of storing the entire training dataset.

**7. Support Vector Machines**

Support vector machines are supervised machine learning algorithms and it is widely used in classification objectives.

The objective of the support vector machine algorithm is to find a hyperplane in N-dimensional space (N — the number of features) that distinctly classifies the data points.

In this algorithm, we plot each data item as a point in n-dimensional space with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyperplane that differentiates the two classes very well (look at the below snapshot).

In SVM, a hyperplane is selected to best separate the points in the input variable space by their class, either class 0 or class 1. The loss function that helps maximize the margin is hinge loss.

In two-dimensions, you can visualize this as a line and let’s assume that all of our input points can be completely separated by this line. The SVM learning algorithm finds the coefficients that result in the best separation of the classes by the hyperplane.

The basic concept behind support vector machines is of decision planes that define decision boundaries, a decision plane is one that separates between a set of objects having different class memberships.

If we talk about it’s about its pros then it is one of those accurate result giving machine algorithms. Support vector machines also work well on smaller cleaner datasets and it can be more efficient because it uses a subset of training points

And if we look at its cons then it is not suited to larger datasets as the training time and it is also less effective on noisier datasets with overlapping classes.

SVM machine learning algorithms are mostly used face detection, it classifies parts of the image as a face and non-face and creates a square boundary around the face. It is also used text and hypertext categorization, classification of images, bioinformatics etc.

**8. Apriori**

The Apriori machine learning algorithm is an unsupervised algorithm used frequently to sort information into categories. The sorted information found very helpful with any data management process, it also ensures that data users are apprised of new information and can figure out the data that they are working with.

The Apriori algorithm basically generates associated rules from given data set and works with ‘bottom-up’ approach where frequently used subsets are extended one at a time and algorithm terminates when no further extension could be carried forward.

This machine learning algorithm is a used in a transactional database to mine frequent itemsets and then generate association rules.

It is popularly used in market basket analysis, where one checks for combinations of products that frequently co-occur in the database.

The Apriori algorithm fundamentally works on its two basic principles, first that if an itemset occurs frequently then all subset of itemset occurs frequently and other is that if an itemset occurs infrequently then all superset has infrequently occurrences.

In mostly write the association rule for ‘if a person purchases item X, then he purchases item Y’ as: X -> Y.

For example, if a person purchases milk and sugar, then he is likely to purchase coffee powder. This could be written in the form of an association rule as {milk, sugar} -> coffee powder. Association rules are generated after crossing the threshold for support and confidence.

The Support measure helps prune the number of candidate itemsets to be considered during frequent itemset generation. This support measure is guided by the Apriori principle.

The Apriori principle states that if an itemset is frequent, then all of its subsets must also be frequent.

The Apriori machine learning algorithm works by recognizing a particular characteristic of a data set and attempting to note how frequently that characteristic pop up throughout the set. The characteristics that are frequent can then be analyzed and place into pairs

This process helps to point out more relationships between relevant data points. Other forms of data can be pruned and placed into their own categories.

The definition of “frequent” is inherently relative and only makes sense in context.

Therefore, the idea is implemented in the Apriori algorithm through a pre-arranged amount determined by either the operator or the algorithm. A “frequent” data characteristic is one that occurs above that pre-arranged amount, known as support.

Analysis can detect more and more relations throughout the body of data until the algorithm has exhausted all of the possible.

Apriori helps the customers buy their items with ease, and enhances the sales performance of the departmental store.

This algorithm has utility in the field of healthcare as it can help in detecting adverse drug reactions (ADR) by producing association rules to indicate the combination of medications and patient characteristics that could lead to ADRs.

**9. Boosting with AdaBoost**

Boosting with AdaBoost are the boosting algorithms that are mostly used when there is a massive load of data that is needed to be handled in order to make predictions with high accuracy.

Boosting with AdaBoost machine learning algorithms are powerful, flexible and can be interpreted nicely with some tricks. It is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers.

This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added.

In short, it combines multiple weak or average predictors to a build strong predictor. These boosting algorithms always work well in data science competitions like Kaggle, AV Hackathon, CrowdAnalytix.

AdaBoost was the first really successful boosting algorithm developed for binary classification.

It is the best starting point for understanding boosting. Modern boosting methods build on AdaBoost, most notably stochastic gradient boosting machines.

Boosted algorithms are used where we have plenty of data to make a prediction. And we seek exceptionally high predictive power. It is used for reducing bias and variance in supervised learning.

**10. Random Forest**

We are now at the end of our tour of machine learning algorithms, and the last algorithm that we are going to see is Random Forest machine learning algorithms.

The Random Forest machine learning algorithm is easy to use and powerful algorithm and it also very flexible. It is a type of ensemble machine learning algorithm called Bootstrap Aggregation or bagging.

The Random Forest algorithm can use both for classification and the regression kind of problems. It mostly use where the decision trees are drawn in order to select optimal split points, suboptimal splits are made by introducing randomness.

As the name of the algorithm shown, this machine learning algorithm creates a forest and makes it somehow random.

The forest that it builds is an ensemble of Decision Trees as we previously talk and most of the time it is trained with the “bagging” method. The basic concept behind the bagging method is that a combination of learning models increases the overall result.

If you get good results with an algorithm with high variance (like decision trees), you can often get better results by bagging that algorithm.

For classifying a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).

Each tree of the forest has planted and grown as follows if the number of cases in the training set is N, then the sample of N cases is taken at random but *with replacement*. This sample will be the training set for growing the tree.

Whereas if there are M input variables, then a number m<<M is specified such that at each node, m variables are selected at random out of the M and the best split on this m is used to split the node.

The value of m is held constant during the forest growing.

Each tree is grown to the largest extent possible. There is no pruning.

The random algorithm used in wide varieties applications, the industries that heavily use Random Forest algorithm is Banking, Medicine, Stock Market, E-commerce.

The advantage of Random Forest machine learning algorithms is that the overfitting problem will never come when we use it in any classification problem. Also, the same random forest algorithm can be used for both classification and regression task.

**Conclusion:**

In the end, the only thing I want to say is that machine learning is a huge field and the above machine learning algorithms are only a few of them. The application and chooses of use of an algorithm mostly depend on what kind of project you are going on. Keep exploring keep learning and make this world a better place to live.

**More in AI :**

New Machine Learning Model Helps To Predict Volcanic Eruptions

Nivida’s new GPU TITAN RTX has Monster power for Deep Learning

Google’s Machine Learning Model Decode Humpback Whale Songs