You are on page 1of 44

INTRODUCTION TO

MACHINE LEARNING
1.1 Introduction
▪ Machine Learning (ML) is a sub-field of Artificial Intelligence (AI) which concerns with developing
computational theories of learning and building learning machines.
▪ Learning is a phenomenon and process which has manifestations of various aspects. Learning process
includes gaining of new symbolic knowledge and development of cognitive skills through instruction
and practice. 
▪ Machine Learning Definition: A computer program is said to learn from experience E with respect to so
me class of tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.
▪ Application of machine learning methods to large databases is called data mining.
▪ The goal of machine learning is to build computer systems that can adapt and learn from their
experience.
▪ Algorithm is used to solve a problem on computer. An algorithm is a sequence of instruction. It should
carry out to transform the input to output. For example, for addition of four numbers is carried out by
giving four number as input to the algorithm and output is sum of all four numbers. 
Why is Machine Learning Important ?
▪ Machine learning algorithms can figure out how to perform important tasks by generalizing from
examples.
▪ Machine Learning provides business insight and intelligence. Decision makers are provided with
greater insights into their organizations. 
▪ Machine learning algorithms discover the relationships between the variables of a system (input,
output and hidden) from direct samples of the system.
▪ Following are some of the reasons:  
1. Some tasks cannot be defined well, except by examples. For example : Recognizing people.
2. Relationships and correlations can be hidden within large amounts of data. To solve these problems,
machine learning and data mining may be able to find these relationships.
3. Human designers often produce machines that do not work as well as desired in the environments
in which they are used.
How Machines Learn?
▪ Machine learning typically follows three phases:
▪ Training: A training set of examples of correct
behavior is analyzed and some representation of
the newly learnt knowledge is stored. This is some
form of rules.
▪ Validation: The rules are checked and, if necessary,
additional training is given. Sometimes additional
test data are used, but instead, a human expert
may validate the rules, or some other automatic
knowledge based component may be used. The
role of the tester is often called the opponent.
▪ Application: The rules are used in responding to
some new situation.
Machine learning process in
divided into three parts: Data Data input: Information is used for
1.1.1 How do Machine Learn?
inputs, abstraction and future decision making.
generalization.

Machine learning is a form of


Artificial Intelligence (AI) that
teaches computers to think in a
Abstraction: Input data is similar way to how humans do:
Generalization: It forms framework
represented in broader way Learning and improving upon past
for making decision.
through the underlying algorithm. experiences. It works by exploring
data and identifying patterns and
involves minimal human
intervention.
Abstraction
▪ During the machine learning process, knowledge is fed in the form of input data.
Collected data is raw data. It can not used directly for processing.
▪  Model known in machine leaning paradigm is summarized knowledge representation of
raw data. The model may be in any one of the following forms:
1. Mathematical equations.
2. Specific data structure like trees.
3. Logical grouping of similar observations.
4. Computational blocks.  
▪ Choice of the model used to solve specific learning problem is the human task.
Some of the parameters are as follows:
a)  Type of problem to be solved.
b)  Nature of the input data.
c)  Problem domain.
1.1.2 Well Posed Learning Problem

▪ Definition: A computer program is said to learn from experience E with respect to some class of tasks
T and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E.
▪ Identify three features are as follows:
1.  Class of tasks
2.  Measure of performance to be improved
3. Source of experience
What are T, P, E? How do we formulate a machine learning problem ?
1. Task T: Driving on public, 4-lane highway using vision sensors. 
2. Performance measure P: Average distance traveled before an error (as judged by human overseer).
3. Training experience E: A sequence of images and steering commands recorded while observing a
human driver.
1.2 Types of
Machine Learning
▪ Learning is constructing or
modifying representation of
what is being experienced.
Learn means to get knowledge
of by study, experience or being
taught.
▪ Machine learning is usually
divided into three types:
Supervised, unsupervised and
reinforcement learning.
1.2.1 Supervised Learning
▪ Supervised learning is the machine learning task of inferring a function from supervised training data.
The training data consist of a set of training examples. 
▪ Supervised learning in which the network is trained by providing it with input and matching output
patterns. These input-output pairs are usually provided by an external teacher.
▪ Human learning is based on the past experiences. A computer does not have experiences.
▪ Supervised learning is the machine learning task of inferring a function from supervised training data.
▪ A supervised learning algorithm analyzes the training data and produces an inferred function, which is
called a classifier or a regression function. Fig. 1.2.2. shows supervised learning process.
▪ Each input vector requires a
corresponding target vector.
Training Pair =  (Input Vector, Target
Vector)
▪ Supervised learning denotes a
method in which some input
vectors are collected and presented
to the network.
▪ Supervised learning is further
divided into methods which use
reinforcement or error correction.
The perceptron learning algorithm
is an example of supervised
learning with reinforcement.
▪ In order to solve a given problem of supervised learning, following steps
are performed:
1. Find out the type of training examples.
2. Collect a training set.
3. Determine the input feature representation of the learned function.
4. Determine the structure of the learned function and corresponding
learning algorithm.
5. Complete the design and then run the learning algorithm on the
collected  training set.
6. Evaluate the accuracy of the learned function. After parameter
adjustment and learning, the performance of the resulting function
should be measured on a test set that is separate from the training set.
1.2.1.1 Classification
▪ Classification predicts categorical labels (classes), prediction models
continuous-values function.
▪ Prediction means models continuous-valued functions ,i.e., predicts
unknown or missing values.
▪ Aim: To predict categorical class labels for new samples.
▪ Input: Training set of samples, each with a class label.
▪ Output: Classifier is based on the training set and the class labels.  
▪ Prediction is similar to classification. It constructs a model and uses predict
unknown or missing value.
▪ Classification and prediction may need to be preceded by relevance
analysis, Which attempts to identify attributes that do not contribute to the
classification or prediction process. 
1.2.1.2 Regression

▪ For an input x, if the output is continuous, this is called a regression problem. For
example, based on historical information of demand for tooth paste in your
supermarket, you are asked to predict the demand for the next month.

▪ Regression is concerned with the prediction of continuous quantities. Linear


regression is the oldest and most widely used predictive model in the field of
machine learning. 

▪ For regression tasks, the typical accuracy metrics are Root Mean Square Error
(RMSE) and Mean Absolute Percentage Error (MAPE). These metrics measure the
distance between the predicted numeric target and the actual numeric answer.
Regression Line
▪ Least squares: The least squares regression line is the line that makes the sum of squared residuals as small as
possible. Linear means “straight line”.
▪ Regression line is the line which gives the best estimate of one variable from the Value of any other given variable. 
▪  The regression line gives the average relationship between the two variables in Mathematical form.
▪  For two variables X and Y, there are always two lines of regression. 
▪ Regression line of X on Y: Gives the best estimate for the value of X for any specific given values of Y:
X = a + bY
▪ Regression line of Y on X: Gives the best estimate for a value of Y for any specific given values of X:

Y = a + bx
Where
a= X – intercept, Y-intercept
b= Slope of the line 
X = Dependent variable
Y = Independent variable
Evaluating a Regression Model
▪ Assume we want to predict a car’s price using some features such as
dimensions, horsepower, engine specification, mileage etc. This is a
typical regression problem, Where the target variable (price) is a
continuous numeric value.

Advantages:
▪ Training a linear regression model is usually much faster than
methods such as neural networks.
▪ Linear regression models are simple and require minimum memory
to implement .
Assessing Performance of Regression- Error Measures
▪ The training error is the mean error over the training sample.
The test error is the expected prediction error over an
independent test sample.

▪ Unlike decision trees, regression trees and model trees are


used for prediction. In regression trees, each leaf stores a
continuous-valued prediction. In model trees, each leaf holds
a regression model.
1.2.2 Unsupervised Learning
▪ The model is not provided with the correct results during the training. It can
be used to cluster the input data in classes on the basis of their statistical
properties only. Cluster significance and labeling.
▪ The labeling can be carried out even if the labels are only available for a
small number of objects representative of the desired classes. All similar
inputs patterns are grouped together as clusters.
▪ External teacher is not used and is based upon only local information. It is
also referred to as self-organization.
▪ They are called unsupervised because they do not need a teacher or
super-visor to label a set of training examples. Only the original data is
required to start the analysis.
1.2.2.1 Clustering
▪ Clustering of data is a method by which large sets of
data are grouped into clusters of smaller sets of similar
data. Clustering can be considered the most important
unsupervised learning problem.
▪ A cluster is therefore a collection of objects which are
“similar” between them and are “dissimilar” to the
objects belonging to other clusters. Fig. 1.2.8 shows
cluster.
▪ In this case we easily identify the 4 clusters into which
the data can be divided; the similarity criterion is
distance: two or more objects belong to the same
cluster if they are “close” according to a given distance
(in this case geometrical distance). This is called
distance-based clustering.
▪ Clustering means grouping of data or dividing a large
data set into smaller data sets of some similarity.
▪ Cluster centroid : The centroid of a cluster is a point
whose parameter values are the mean of the parameter
values of all the points in the clusters. Each cluster has a
well defined centroid.
▪ Distance: The distance between two points is taken as a
common metric to as see the similarity among the
components of a population. The commonly used distance
measure is the Euclidean metric which defines the distance
between two points p = (P1, P2, ...) and q = (q1,q2,...) is
given by :
   d=
Clustering algorithms may be classified as listed below:
1. Exclusive clustering
2. Overlapping clustering
3. Hierarchical clustering
4. Probabilistic clustering
▪ Examples of Clustering Applications
▪ Marketing : Help marketers discover distinct groups in their
customer bases and then use this knowledge to develop targeted
marketing programs.
▪ Land use: Identification of areas of similar land use in an earth
observation database.
▪ Insurance :  Identifying groups of motor insurance policy holders with
a high average claim cost.
▪ Urban planning: Identifying groups of houses according to their
house type, value, and geographical location.
▪ Seismology: Observed earth quake epicenters should be clustered
along continent Faults.
1.2.3 Reinforcement Learning
▪ User will get immediate feedback in supervised
learning and no feedback from unsupervised learning.
But in the reinforced learning, you will get delayed
scalar feedback.
▪ Reinforcement learning is learning what to do and how
to map situations to actions. The learner is not told
which actions to take.
▪ Reinforced learning is deals with agents that must
sense and act upon their environment. 
▪ It allows machines and software agents to automatically
determine the ideal behavior within a specific context,
in order to maximize its performance.
▪ Example of Reinforcement Learning: A mobile robot
decides whether it should enter a new room in search
of more trash to collect or start trying to find its way
back to its battery recharging station. It makes its
decision based on how quickly and easily it has been
able to find the recharger in the past.
1.2.3.1 Elements of Reinforcement Learning
▪ Reinforcement learning elements are as follows:
1. Policy
2. Reward Function
3. Value Function 
4. Model of the environment
Policy : Policy defines the learning agent behavior for given time period. 
Reward Function: Reward function is used to define a goal in a
reinforcement learning problem.
Value function: Value functions specify what is good in the long run.
Model of the environment: Models are used for planning.
▪ The agent has sensors to decide on its state in the environment
and takes an action that modifies its state.
▪ Reinforcement Learning is a technique for solving Markov Decision
Problems.
Supervised Learning Unsupervised Learning Reinforcement Learning

Supervised learning requires that the target For unsupervised learning typically either the Reinforcement learning is learning what to do
variable is well defined and that a sufficient target variable is unknown or has only been and how to map situations to actions. The
number of its values are given. recorded for too small a number of cases. learner is not told which actions to take.

Supervised learning deals with two main Unsupervised Learning deals with clustering and Reinforcement learning deals with exploitation
tasks regression and classification. associative rule mining problems. or exploration, Markov’s decision processes,
policy learning, deep learning and value
learning.

The input data in supervised learning in Unsupervised learning uses unlabelled data. The data is not predefined in reinforcement
labelled data. learning.

Learns by using labelled data. Trained using unlabelled data Without any  Works on interacting with the environment.
guidance.

1.2.4 Difference between Supervised,


Unsupervised and Reinforcement Learning :
1.3 Application of Machine Learning
▪ Examples of successful applications of machine learning: 
1.  Learning to recognize spoken words.
2. Learning to drive an autonomous vehicle.
3. Learning to classify new astronomical structures.
4.  Learning to play world-class backgammon.
5. Spoken language understanding: within the context of a limited
domain, determine the meaning of something uttered by a
speaker to the extent that it can be classified into one of a fixed
set of categories.
Face Recognition
▪ Face recognition task is effortlessly and every day we recognize our friends, relative and family
members. We also recognition by looking at the photographs. In photographs, they are in different
pose, hair styles, background light, makeup and without makeup.
▪ We do it subconsciously and cannot explain how we do it. Because we can’t explain how we do it, we
can’t write an algorithm.
Healthcare:
▪ With the advent of wearable sensors and devices that use data to access health of a patient in real
time, ML is becoming a fast-growing trend in healthcare.
▪ Sensors in wearable provide real-time patient information, such as overall health condition, heartbeat,
blood pressure and other vital parameters.
Financial services:
▪ Companies in the financial sector are able to identify key insights in financial data as well as prevent
any occurrences of financial fraud, with the help of machine learning technology.
▪ The technology is also used to identify opportunities for investments and trade.
1.4 Hypothesis
Space
▪ Hypothesis represents a function approximation for the target function. It is
used to associate/estimate or predict the target value Y, based on the input
dataset, X, model parameters, and hyper-parameters. It is represented
using the letter, h.
▪ The hypothesis is also referred to as a model. The hypothesis can be
represented as Y = h(X).
▪ If H comprises all possible subsets of X, we cannot learn anything new
beyond the training data in D, because the labels c(x) of instances x
outside D can independently and arbitrarily be 0 or 1. That is, we have no
inductive bias.
▪ Search space: The space of all feasible solutions is called search space.
Each point in the search space represent one feasible solution. Each
feasible solution can be “marked” by its value or fitness for the problem.
▪ We are looking for our solution, which is one point or more among feasible
solutions, that is one point in the search space.
Motivation:
▪ The solutions(s) to machine learning tasks are often called hypotheses, because they can be expressed as
a hypothesis that the observed positives and negatives for a categorization is explained by the concept
learned for the solution. 
▪ General definition of a hypothesis: “A hypothesis is a statement of a relationship between two or more
variables”.

Reason for using hypotheses


Learning from a limited size database indicating the effectiveness of different medical treatments, it is
important to understand as precisely as possible the accuracy of the learned hypotheses.
The evaluating hypotheses are an integral component of many learning methods.
Estimating the accuracy of a hypothesis is relatively straightforward when data is plentiful.
An estimator is any random variable used to estimate some parameter of the underlying population from
which a sample is drawn. Is E[Y] – p.
1. The estimation bias of an estimator Y for an arbitrary parameter p If the estimation bias is 0, then Y is an
unbiased estimator for p.
2. The variance of an estimator Y for an arbitrary parameter p is simply the variance of Y.
1.5 Inductive Bias
▪ The Candidate- Elimination algorithm will converge toward the true target concept
provided it is given accurate training examples and provided its initial hypothesis space
contains the target concept. 
▪ What if the target concept is not contained in the hypothesis space?
▪ Can we avoid this difficulty by using a hypothesis space that includes every possible
hypothesis?
▪ How does the size of this hypothesis space influence the ability of the algorithm to
generalize to unobserved instances?
▪ In EnjoySport example, we restricted the hypothesis space to include only conjunctions of
attribute values. Because of this restriction, the hypothesis space is unable to represent
even simple disjunctive target concepts such as “Sky = Sunny or Sky = Cloudy.”
Example Sky Air Temp Humidity Wind Water Forecast Enjoy
Sport
1 Sunny Warm Normal Strong Cool Change YES
2 Cloudy Warm Normal Strong Cool Change YES
3 Rainy Warm Normal Strong Cool Change NO

• From first two examples: S2: <?, Warm, Normal, Strong, Cool, Change>

• This is inconsistent with third examples, and there are no hypotheses consistent with these three
examples PROBLEM: We have biased the learner to consider only conjunctive hypotheses. We
require a more expressive hypothesis space.

• The obvious solution to the problem of assuring that the target concept is in the hypothesis space H
is to provide a hypothesis space capable of representing every teachable concept.
Inductive Bias – Fundamental Property of Inductive Inference:
▪ A learner that makes no a priori assumptions regarding the identity concept has no rational
basis for classifying any unseen instances.
▪ Inductive Leap: A learner should be able to generalize training data using prior assumptions
in order to classify unseen instances.
▪ The generalization is known as inductive leap and our prior assumptions are the inductive
bias of the learner.
Inductive Bias Formal Definition
▪ Consider a concept learning algorithm L for the set of instances X. Let c be an arbitrary
concept defined over X, and let Dc = {<x, c(x)>) be an arbitrary set of training examples of
c.
▪ Let L(xi,Dc) denote the classification assigned to the instance xi by L after training on the
data Dc.
▪ The inductive bias of L Is any minimal set of assertions B such that for any target concept c
and corresponding training examples Dc the following formula holds.
Three Learning Algorithms:

▪ ROTE-LEARNER: Learning corresponds simply to storing each observed training example in memory.
Subsequent instances are classified by looking them up in memory. If the instance is found in memory,
the stored classification is returned. Otherwise, the system refuses to classify the new instance.
Inductive Bias: No inductive bias

▪ CANDIDATE-ELIMINATION: New instances are classified only in the case where all members of the
current version space agree on the classification. Otherwise, the system refuses to classify the new
instance. Inductive Bias: the target concept can be represented in its hypothesis space.

▪ FIND-S: This algorithm, described earlier, finds the most specific hypothesis consistent with the
training examples. It then uses this hypothesis to classify all subsequent instances. Inductive Bias: The
target concept can be represented in its hypothesis space, and all instances are negative instances
unless the opposite is entailed by its other knowledge.
1.6 Evaluation and
Cross Validation
▪ Cross-validation is a technique for evaluating estimating performance by training
several machine learning models on subsets of the available input data and
evaluating them on the complementary subset of the data. 

▪ But this generic task is broken down into a number of special cases. When training is
done, the data that was removed can be used to test the performance of the
learned model on “new” data. This is the basic idea for a whole class of model
evaluation methods called cross validation.

▪ Types of cross validation methods are holdout, K-fold and Leave-one-out.

▪ The K-fold cross validation is one way to improve over the holdout method. The
data set is divided into k subsets, and the holdout method is repeated k times.

▪ Leave-one-out cross validation is K-fold cross validation taken to its logical extreme,
with K equal to N, the number of data points in the set.

▪ Cross-validation ensures non-overlapping test sets.


K-fold cross-validation :
▪ In this technique, k-1 folds are used for
training and the remaining – for testing as
shown 
▪ The advantage is that entire data is used for
training and testing. The error rate of the
model is average of the error rate of each
iteration.
▪ This technique can also be called a form the
repeated hold-out method. The error rate
could be improved by using stratification
technique.
1.6.1 Evaluating Performance Model
▪ Classification is major task of supervised learning. The
responsibility of the classification model is to assign class
label to the target feature based on the value of the
predictor features.
▪ When performing classification predictions, there’s four
types of outcomes that could occur. 
▪ Confusion matrix is also called a contingency table. 
1) True positives are when you predict an observation
belongs to a class and it actually does belong to that class.
2) True negatives are when you predict an observation
does not belong to a class and it actually does not belong
to that class.
3) False positives occur when you predict an observation
belongs to a class when in reality it does not.
4) False negatives occur when you predict an observation
does not belong to a class when in fact it does.
▪ For any classification model, model accuracy is given by total
number of correct classifications (True Positive or True
Negative) divided by total number of classifications done.

▪ The complement of accuracy rate is the error rate, which


evaluates a classifier by its percentage of incorrect predictions.

Error rate = 1- (Accuracy rate)


The recall accuracy rate predicted as positive.
The specificity is a statistical measure of how well a binary
classification test correctly identifies the negatives cases.

   Precision measures how good our model is when the


prediction is positive.
▪ Kappa value of a model indicates the adjusted the model accuracy 
 Kappa = (total accuracy – random accuracy) / (1- random accuracy).
▪ F₁ score is the weighted average of precision and recall.
F₁_ score = 2 Precision *Recall / Precision + Recall

▪ Total accuracy is simply the sum of true positive and true negatives,
divided by the total number of items, that is :
Total accuracy = TP+TN / (TP+TN+FP+FN)
▪ Random Accuracy is defined as the sum of the products of reference
likelihood and result likelihood for each class. That is,
 Random accuracy = Actual False *Predicted False + Actual True Predicted
True/  (Total *Total)
ROC Curve:
▪ Receiver Operating Characteristics (ROC) graphs have long been used in signal detection theory to
depict the trade-off between hit rates and false alarm rates over noisy channel. 
▪ ROC curve summarizes the performance of the model at different threshold values.
▪ By combining confusion matrices at all threshold values. ROC curves are typically used in binary
classification to study the output of a classifier.
▪ An ROC plot plots true positive rate on the Y-axis against false positive rate on the X-axis; a single
contingency table corresponds to a single point in an ROC plot.
▪ An ROC curve is convex if the slopes are monotonically non-increasing when moving along the
curve from (0, 0) to (1, 1). A concavity in an ROC curve, i.e., two or more adjacent segments with
increasing slopes, indicates a locally worse than random ranking.

▪ True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows:
▪ True Positive Rate TPR =TP / TP+FN
▪ False Positive Rate (FPR) is defined as follows: False Positive Rate FPR = = FP / FP+TN
1.6.2 Concept Learning
▪ Inducing general functions from specific training examples is a main issue of machine learning.
▪ Concept learning: Acquiring the definition of a general category from given sample positive and negative
training examples of the category.
▪ The hypothesis space has a general-to-specific ordering of hypotheses, and the search can be efficiently
organized by taking advantage of a naturally occurring structure over the hypothesis space.
▪ Formal definition for concept learning: Inferring a boolean-valued function from training examples of its input
and output.
▪  An example for concept – learning is the learning of bird-concept from the given examples of birds (positive
examples) and non-birds (negative examples).
▪ The inductive learning hypothesis: Any hypothesis found to approximate the target function well over a
sufficiently large set of training examples will also approximate the target function well over other unobserved
examples.
▪ Lacking any further information, our assumption is that the best hypothesis regarding unseen instances is the
hypothesis that best fits the observed training data. This is the fundamental assumption of inductive learning.
1.6.3 Concept Learning as Search
▪ Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesis representation.
▪ The goal of this search is to find the hypothesis that best fits the training
examples.

▪  A hypothesis is a vector of constraints for each attribute.


1. Indicate by a “?” that any value is acceptable for this attribute
2. Specify a single required value for the attribute 
3. Indication by a “Ø” that no value is acceptable
▪ If some instance x satisfies all the constraints of hypothesis h, then h classifies x
as a positive example (h(x) = 1).
General-to-Specific Ordering of Hypotheses
Many algorithms for concept learning organize the search through the hypothesis spaces by relying on a
general-to-specific ordering of hypotheses.
▪ By taking advantage of this naturally occurring structure over the hypothesis space, we can design learning
algorithms that exhaustively search even infinite hypothesis spaces without explicitly enumerating every
hypothesis.
▪ Consider two hypotheses:
▪ h₁ = (Sunny, ?, ?, Strong, ?, ?) 
▪ h₂ (Sunny, ?, ?, ?, ?, ?)
▪ One learning method is to determine the most specific hypothesis that matches all the training data.
▪ More-General-Than-Or-Equal Relation: Let h₁ and h₂ be two boolean-valued functions defined over X. Then h₁
is more-general-than-or-equal-to h2 (written h1 2h2). If and only if any instance that satisfies h₂ also satisfies h₁.
▪ h1 is more-general-than h₂ (h1> h₂) if and only if h₁ 2 h₂ is true and h₂ 2 h₁ is false. We also say h₂ is
more-specific-than h₁.
1.6.4 Find S Algorithm
▪ FINDS Algorithm starts from the most specific hypothesis
and generalize it by considering only positive examples.

This algorithm ignores negative examples. As long as the


hypothesis space contains a hypothesis that describes the true
target concept, and the training data contains no errors,
ignoring negative examples does not cause to any problem.
Example Sky Air Humidity Wind Water Forecast Enjoy
▪ FIND – S algorithm finds the most specific hypothesis Sport
within H that is consistent with the positive training 1 Sunny Warm Normal Strong Warm Same YES
examples. 2 Sunny Warm High Strong Warm Same YES
▪ The final hypothesis will also be consistent with negative 3 Rainy Cold High Strong Warm Change No
examples if the correct target concept is in H, and the 4 Sunny Warm High Strong Cold Change YES
training examples are correct.
▪ h = <Ø,Ø,Ø,Ø,Ø,Ø>
▪ h=<Sunny, Warm, Normal, Strong, Warm, Same>
▪ h= <Sunny, Warm, ?, Strong, Warm, Same>
▪ h= <Sunny, Warm, ?, Strong, ?, ?>

You might also like