Professional Documents
Culture Documents
SEARCHING ALGORITHMS
Searching algorithms are methods or procedures used to find a specific item or element
within a collection of data. These algorithms are widely used in computer science and are
crucial for tasks like searching for a particular record in a database, finding an element in a
sorted list, or locating a file on a computer.
These are some commonly used searching algorithms:
1. Linear Search: In this simple algorithm, each element in the collection is
sequentially checked until the desired item is found, or the entire list is traversed. It is
suitable for small-sized or unsorted lists, but its time complexity is O(n) in the worst
case.
2. Binary Search: This algorithm is applicable only to sorted lists. It repeatedly
compares the middle element of the list with the target element and narrows down
the search range by half based on the comparison result. Binary search has a time
complexity of O(log n), making it highly efficient for large sorted lists.
3. Hashing: Hashing algorithms use a hash function to convert the search key into an
index or address of an array (known as a hash table). This allows for constant-time
retrieval of the desired item if the hash function is well-distributed and collisions are
handled appropriately. Common hashing techniques include direct addressing,
separate chaining, and open addressing.
4. Interpolation Search: Similar to binary search, interpolation search works on sorted
lists. Instead of always dividing the search range in half, interpolation search uses the
value of the target element and the values of the endpoints to estimate its
approximate position within the list. This estimation helps in quickly narrowing
down the search space. The time complexity of interpolation search is typically O(log
log n) on average if the data is uniformly distributed.
5. Tree-based Searching: Various tree data structures, such as binary search trees
(BST), AVL trees, or B-trees, can be used for efficient searching. These structures
impose an ordering on the elements and provide fast search, insertion, and deletion
operations. The time complexity of tree-based searching algorithms depends on the
height of the tree and can range from O(log n) to O(n) in the worst case.
6. Ternary Search: Ternary search is an algorithm that operates on sorted lists and
repeatedly divides the search range into three parts instead of two, based on two
splitting points. It is a divide-and-conquer approach and has a time complexity of
O(log₃ n).
7. Jump Search: Jump search is an algorithm for sorted lists that works by jumping
ahead a fixed number of steps and then performing linear search in the reduced
subarray. It is useful for large sorted arrays and has a time complexity of O(sqrt(n)),
where n is the size of the array.
8. Exponential Search: Exponential search is a technique that combines elements of
binary search and linear search. It begins with a small range and doubles the search
range until the target element is within the range. It then performs a binary search
within that range. Exponential search is advantageous when the target element is
likely to be found near the beginning of the array and has a time complexity of O(log
n).
9. Fibonacci Search: Fibonacci search is a searching algorithm that uses Fibonacci
numbers to divide the search space. It works on sorted arrays and has a similar
approach to binary search, but instead of dividing the array into halves, it divides it
into two parts using Fibonacci numbers as indices. Fibonacci search has a time
complexity of O(log n).
10. Interpolation Search for Trees: This algorithm is an extension of interpolation
search designed for tree structures such as AVL trees or Red-Black trees. It combines
interpolation search principles with tree traversal to efficiently locate elements in the
tree based on their values. The time complexity depends on the tree structure and can
range from O(log n) to O(n) in the worst case.
11. Hash-based Searching (e.g., Bloom Filter): Hash-based searching algorithms
utilize hash functions and data structures like Bloom filters to determine whether an
element is present in a set or not. These algorithms provide probabilistic answers,
meaning they can occasionally have false positives (indicating an element is present
when it is not), but no false negatives (if an element is not present, it will never claim
it is). Bloom filters have a constant-time complexity for search operations.
12. String Searching Algorithms: Searching algorithms specific to string data include
techniques like Knuth-Morris-Pratt (KMP) algorithm, Boyer-Moore algorithm,
Rabin-Karp algorithm, and many others. These algorithms optimize the search for
patterns within text or strings and are widely used in text processing, pattern
matching, and string matching tasks.
A decision tree is a structure that includes a root node, branches, and leaf nodes. Each
internal node denotes a test on an attribute, each branch denotes the outcome of a test, and
each leaf node holds a class label. The topmost node in the tree is the root node.
The following decision tree is for the concept buy_computer that indicates whether a
customer at a company is likely to buy a computer or not. Each internal node represents a
test on an attribute. Each leaf node represents a class.
The benefits of having a decision tree are as follows −
It does not require any domain knowledge.
It is easy to comprehend.
The learning and classification steps of a decision tree are simple and fast.
Decision Tree Induction Algorithm
A machine researcher named J. Ross Quinlan in 1980 developed a decision tree algorithm
known as ID3 (Iterative Dichotomiser). Later, he presented C4.5, which was the successor of
ID3. ID3 and C4.5 adopt a greedy approach. In this algorithm, there is no backtracking; the
trees are constructed in a top-down recursive divide-and-conquer manner.
Generating a decision tree form training tuples of data partition D
Algorithm : Generate_decision_tree
Input:
Data partition, D, which is a set of training tuples
and their associated class labels.
attribute_list, the set of candidate attributes.
Attribute selection method, a procedure to determine the
splitting criterion that best partitions that the data
tuples into individual classes. This criterion includes a
splitting_attribute and either a splitting point or splitting subset.
Output:
A Decision Tree
Method
create a node N;
if Dj is empty then
attach a leaf labeled with the majority
class in D to node N;
else
attach the node returned by Generate
decision tree(Dj, attribute list) to node N;
end for
return N;
Tree Pruning
Tree pruning is performed in order to remove anomalies in the training data due to noise or
outliers. The pruned trees are smaller and less complex.
Tree Pruning Approaches
There are two approaches to prune a tree −
Pre-pruning − The tree is pruned by halting its construction early.
Post-pruning - This approach removes a sub-tree from a fully grown tree.
Cost Complexity
The cost complexity is measured by the following two parameters −
Number of leaves in the tree, and
Error rate of the tree.
A DECISION TREE
A Decision Tree is an algorithm used for supervised learning problems such as classification
or regression. A decision tree or a classification tree is a tree in which each internal (nonleaf)
node is labeled with an input feature. The arcs coming from a node labeled with a feature are
labeled with each of the possible values of the feature. Each leaf of the tree is labeled with a
class or a probability distribution over the classes.
A tree can be "learned" by splitting the source set into subsets based on an attribute value
test. This process is repeated on each derived subset in a recursive manner called recursive
partitioning. The recursion is completed when the subset at a node has all the same value of
the target variable, or when splitting no longer adds value to the predictions. This process of
top-down induction of decision trees is an example of a greedy algorithm, and it is the most
common strategy for learning decision trees.
Decision trees used in data mining are of two main types −
Classification tree − when the response is a nominal variable, for example if an
email is spam or not.
Regression tree − when the predicted outcome can be considered a real number (e.g.
the salary of a worker).
Decision trees are a simple method, and as such has some problems. One of this issues is the
high variance in the resulting models that decision trees produce. In order to alleviate this
problem, ensemble methods of decision trees were developed. There are two groups of
ensemble methods currently used extensively −
Bagging decision trees − These trees are used to build multiple decision trees by
repeatedly resampling training data with replacement, and voting the trees for a
consensus prediction. This algorithm has been called random forest.
Boosting decision trees − Gradient boosting combines weak learners; in this case,
decision trees into a single strong learner, in an iterative fashion. It fits a weak tree to
the data and iteratively keeps fitting weak learners in order to correct the error of the
previous model.
# Install the party package
# install.packages('party')
library(party)
library(ggplot2)
head(diamonds)
# We will predict the cut of diamonds using the features available in the
diamonds dataset.
ct = ctree(cut ~ ., data = diamonds)
set.seed(123)
ames_split <- initial_split(AmesHousing::make_ames(), prop = .7, strata = "Sale_Price")
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
The basic idea
Some previous tutorials (i.e. linear regression, logistic regression, regularized regression)
have focused on linear models. In those tutorials we illustrated some of the advantages of
linear models such as their ease and speed of computation and also the intuitive nature of
interpreting their coefficients. However, linear models make a strong assumption about
linearity, and this assumption is often a poor one, which can affect predictive accuracy.
We can extend linear models to capture non-linear relationships. Typically, this is done by
explicitly including polynomial parameters or step functions. Polynomial regression is a
form of regression in which the relationship between the independent variable x and the
dependent variable y is modeled as an nth�ℎ degree polynomial of x. For example, Equation
1 represents a polynomial regression function where y is modeled as a function
of x with d degrees. Generally speaking, it is unusual to use d greater than 3 or 4 as the
larger d becomes, the easier the function fit becomes overly flexible and oddly
shapened…especially near the boundaries of the range of x values.
yi=β0+β1xi+β2x2i+β3x3i⋯+βdxdi+ϵi,(1)(1)��=�0+�1��+�2��2+�3��3⋯+�
����+��,
An alternative to polynomial regression is step function regression. Whereas polynomial
functions impose a global non-linear relationship, step functions break the range of x into
bins, and fit a different constant for each bin. This amounts to converting a continuous
variable into an ordered categorical variable such that our linear regression function is
converted to Equation 2
yi=β0+β1C1(xi)+β2C2(xi)+β3C3(xi)⋯+βdCd(xi)+ϵi,(2)(2)��=�0+�1�1(��)+�2�
2(��)+�3�3(��)⋯+����(��)+��,
where C1(x)�1(�) represents x values ranging
from c1≤x<c2�1≤�<�2, C2(x)�2(�) represents x values ranging
from c2≤x<c3�2≤�<�3, ……, Cd(x)��(�) represents x values ranging
from cd−1≤x<cd��−1≤�<��. Figure 1 illustrate polynomial and step function fits
for Sale_Price as a function of Year_Built in our ames data.
Figure 1: Blue line represents predicted Sale_Price values as a function of Year_Built for
alternative approaches to modeling explicit nonlinear regression patterns. (A) Traditional
nonlinear regression approach does not capture any nonlinearity unless the predictor or
response is transformed (i.e. log transformation). (B) Degree-2 polynomial, (C) Degree-3
polynomial, (D) Step function fitting cutting Year_Built into three categorical levels.
Although useful, the typical implementation of polynomial regression and step functions
require the user to explicitly identify and incorporate which variables should have what
specific degree of interaction or at what points of a variable x should cut points be made for
the step functions. Considering many data sets today can easily contain 50, 100, or more
features, this would require an enormous and unncessary time commitment from an analyst
to determine these explicit non-linear settings.
Multivariate regression splines
Multivariate adaptive regression splines (MARS) provide a convenient approach to capture
the nonlinearity aspect of polynomial regression by assessing cutpoints (knots) similar to
step functions. The procedure assesses each data point for each predictor as a knot and
creates a linear regression model with the candidate feature(s). For example, consider our
simple model of Sale_Price ~ Year_Built. The MARS procedure will first look for the single
point across the range of Year_Built values where two different linear relationships
between Sale_Price and Year_Built achieve the smallest error. What results is known as a
hinge function (h(x−a)ℎ(�−�) where a is the cutpoint value). For a single knot (Figure 2
(A)), our hinge function is h(Year_Built−1968)ℎ(Year_Built−1968) such that our two linear
models for Sale_Price are
Sale_Price={136091.022Year_Built≤1968,136091.022+3094.208(Year_Built−1968)Year_B
uilt>1968Sale_Price={136091.022Year_Built≤1968,136091.022+3094.208(Year_Built−196
8)Year_Built>1968
Once the first knot has been found, the search continues for a second knot which is found at
2006 (Figure 2 (B)). This results in three linear models for Sale_Price:
Sale_Price=⎧⎨⎩136091.022Year_Built≤1968,136091.022+2898.424(Year_Built−1968)196
8<Year_Built≤2006,136091.022+20176.284(Year_Built−2006)Year_Built>2006Sale_Price
={136091.022Year_Built≤1968,136091.022+2898.424(Year_Built−1968)1968<Year_Built
≤2006,136091.022+20176.284(Year_Built−2006)Year_Built>2006
Figure 2: Examples of fitted regression splines of one (A), two (B), three (C), and four (D)
knots.
This procedure can continue until many knots are found, producing a highly non-linear
pattern. Although including many knots may allow us to fit a really good relationship with
our training data, it may not generalize very well to new, unseen data. For example, Figure 3
includes nine knots but this likley will not generalize very well to our test data.
Figure 3: Too many knots may not generalize well to unseen data.
Consequently, once the full set of knots have been created, we can sequentially remove
knots that do not contribute significantly to predictive accuracy. This process is known as
“pruning” and we can use cross-validation, as we have with the previous models, to find the
optimal number of knots.
Fitting a basic MARS model
We can fit a MARS model with the earth package. By default, earth::earth() will assess all
potential knots across all supplied features and then will prune to the optimal number of
knots based on an expected change in R2�2 (for the training data) of less than 0.001. This
calculation is performed by the Generalized cross-validation procedure (GCV statistic),
which is a computational shortcut for linear models that produces an error value
that approximates leave-one-out cross-validation (see here for technical details).
Note: The term “MARS” is trademarked and licensed exclusively to Salford Systems
http://www.salfordsystems.com. We can use MARS as an abbreviation; however, it cannot
be used for competing software solutions. This is why the R package uses the name earth.
The following applies a basic MARS model to our ames data and performs a search for
required knots across all features. The results show us the final models GCV statistic,
generalized R2�2 (GRSq), and more.
# Fit a basic MARS model
mars1 <- earth(
Sale_Price ~ .,
data = ames_train
)
Figure 4: Model summary capturing GCV $R^2$ (left-hand y-axis and solid black line)
based on the number of terms retained (x-axis) which is based on the number of predictors
used to make those terms (right-hand side y-axis). For this model, 37 non-intercept terms
were retained which are based on 26 predictors. Any additional terms retained in the model,
over and above these 37, results in less than 0.001 improvement in the GCV $R^2$.
In addition to pruning the number of knots, earth::earth() allows us to also assess potential
interactions between different hinge functions. The following illustrates by including
a degree = 2 argument. You can see that now our model includes interaction terms between
multiple hinge functions (i.e. h(Year_Built-2003)*h(Gr_Liv_Area-2274)) is an interaction
effect for those houses built prior to 2003 and have less than 2,274 square feet of living
space above ground).
# Fit a basic MARS model
mars2 <- earth(
Sale_Price ~ .,
data = ames_train,
degree = 2
)
head(hyper_grid)
## degree nprune
## 1 1 2
## 2 2 2
## 3 3 2
## 4 1 12
## 5 2 12
## 6 3 12
We can use caret to perform a grid search using 10-fold cross-validation. The model that
provides the optimal combination includes second degree interactions and retains 34 terms.
The cross-validated RMSE for these models are illustrated in Figure 5 and the optimal
model’s cross-validated RMSE is $24,021.68.
Artificial Neural Network Tutorial provides basic and advanced concepts of ANNs. Our
Artificial Neural Network tutorial is developed for beginners as well as professions.
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain. An Artificial neural network is usually a computational
network based on biological neural networks that construct the structure of the human brain.
Similar to a human brain has neurons interconnected to each other, artificial neural networks
also have neurons that are linked to each other in various layers of the networks. These
neurons are known as nodes.
Artificial neural network tutorial covers all the aspects related to the artificial neural
network. In this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-
organizing map, Building blocks, unsupervised learning, Genetic algorithm, etc.
What is Artificial Neural Network?
The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known as
nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Relationship between Biological neural network and artificial neural network:
Dendrites Inputs
Synapse Weights
Axon Output
An Artificial Neural Network in the field of Artificial intelligence where it attempts to
mimic the network of neurons makes up a human brain so that computers will have an
option to understand things and make decisions in a human-like manner. The artificial neural
network is designed by programming computers to behave simply like interconnected brain
cells.
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in
such a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.
ADVERTISEMENT
We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs.
If one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off,"
then we get "Off" in output. Here the output depends upon input. Our brain does not perform
the same task. The outputs to inputs relationship keep changing because of the neurons in
our brain, which are "learning."
The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural network, we have to
understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.
Artificial Neural Network primarily consists of three layers:
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
ADVERTISEMENT
ADVERTISEMENT
CNN's are widely used to identify satellite images, process medical images, forecast time
series, and detect anomalies.
How Do CNNs Work?
CNN's have multiple layers that process and extract features from data:
Convolution Layer
CNN has a convolution layer that has several filters to perform the convolution
operation.
Rectified Linear Unit (ReLU)
CNN's have a ReLU layer to perform operations on elements. The output is a rectified
feature map.
Pooling Layer
The rectified feature map next feeds into a pooling layer. Pooling is a down-sampling
operation that reduces the dimensions of the feature map.
The pooling layer then converts the resulting two-dimensional arrays from the pooled
feature map into a single, long, continuous, linear vector by flattening it.
Fully Connected Layer
A fully connected layer forms when the flattened matrix from the pooling layer is fed
as an input, which classifies and identifies the images.
Below is an example of an image processed via CNN.
2. Long Short Term Memory Networks (LSTMs)
LSTMs are a type of Recurrent Neural Network (RNN) that can learn and memorize long-
term dependencies. Recalling past information for long periods is the default behavior.
LSTMs retain information over time. They are useful in time-series prediction because they
remember previous inputs. LSTMs have a chain-like structure where four interacting layers
communicate in a unique way. Besides time-series predictions, LSTMs are typically used for
speech recognition, music composition, and pharmaceutical development.
How Do LSTMs Work?
First, they forget irrelevant parts of the previous state
Next, they selectively update the cell-state values
Finally, the output of certain parts of the cell state
Below is a diagram of how LSTMs operate:
10. Autoencoders
Autoencoders are a specific type of feedforward neural network in which the input and
output are identical. Geoffrey Hinton designed autoencoders in the 1980s to solve
unsupervised learning problems. They are trained neural networks that replicate the data
from the input layer to the output layer. Autoencoders are used for purposes such as
pharmaceutical discovery, popularity prediction, and image processing.
How Do Autoencoders Work?
An autoencoder consists of three main components: the encoder, the code, and the decoder.
Autoencoders are structured to receive an input and transform it into a different
representation. They then attempt to reconstruct the original input as accurately as
possible.
When an image of a digit is not clearly visible, it feeds to an autoencoder neural
network.
Autoencoders first encode the image, then reduce the size of the input into a smaller
representation.
Finally, the autoencoder decodes the image to generate the reconstructed image.
The following image demonstrates how autoencoders operate:
SVM FOR REGRESSION
There are three kernels which SVM uses the most
Linear kernel: Dot product between two given observations
Polynomial kernel: This allows curved lines in the input space
Radial Basis Function (RBF): It creates complex regions within the feature space
In general, regression problems involve the task of deriving a mapping function which
would approximate from input variables to a continuous output variable.
Support Vector Regression uses the same principle of Support Vector Machines. In other
words, the approach of using SVMs to solve regression problems is called Support Vector
Regression or SVR.
Read more on Difference between Data Science, Machine Learning & AI
Now let us look at the classic example of the Boston House Price dataset. It has got 14
columns, where we have one target variable and 13 independent variables with 506 rows/
records. The approach will be to use SVR and use all the three Kernels and then compare the
error metric and choose the model with the least error and then come up with the observation
on the dataset whether its linear or not. The dataset is available at Boston Housing Dataset.
Though the dataset looks simple and we could do a traditional regression we want to see if
the dataset is linear or not and how SVR performs with various kernels, here we will be
focusing on MAE & RMSE only, lower the value better is the model. You can look at the
code which is available in my GitHub repository that is mentioned in the reference section.
Based on the above results we could say that the dataset is non- linear and Support Vector
Regression (SVR)performs better than traditional Regression however there is a caveat, it
will perform well with non-linear kernels in SVR. In our case, SVR performs the least in
Linear kernel, performs well to moderate when Polynomial kernel is used and performs
better when we use the RBF (or Gaussian) Kernel. Maybe I wouldn’t have left with burnt
potato chips had I used SVR while selecting the potatoes as they were non-linear though
they looked very similar. I hope this article was useful in understanding the basics of SVM
and SVR, and when we need to use SVR.