You are on page 1of 34

MACHINE LEARNING

PROGRAM: B.TECH (CSE-DATA SCIENCE)

SEM-V

TAKEN BY: PROF. SHWETA LOONKAR

shwetaloonkar@gmail.com
Syllabus Unit Description Duration
1 Introduction: What is Machine Learning. Supervised Learning. Unsupervised Learning 2

2 Linear Model Selection and Regularization: Linear regression. Hypothesis 8


representation. Gradient descent. Cost function. Linear regression with multiple variables.
Polynomial regression. Logistic
regression. Hypothesis representation. Gradient descent. Cost function.
Linear regression with multiple variables. Normal Equation.
Polynomial regression. Regularization.

3 Moving Beyond Linearity: Neural networks. Hypothesis representation. Cost function. 5


Back propagation. Activation function.
4 Machine Learning System Design: Evaluating hypothesis. Train – Validation – Test. Bias 2
and variance curves. Error analysis. Error metrics for skewed classes. Precision and bias
tradeoff.
5 Tree-Based Methods: The Basics of Decision Trees, Regression Trees, Classification 4
Trees, Trees Versus Linear Models, Advantages and Disadvantages of Trees, Bagging,
Random Forests, Boosting
6 Support Vector Machines: Maximal Margin Classifier, Support Vector Classifiers, Support 4
Vector Machines, SVMs with More than Two
Classes, Relationship to Logistic Regression, ROC Curves, Application
to Gene Expression Data

7 Unsupervised Learning: The Challenge of Unsupervised Learning, Principal Components 5


Analysis, Clustering Methods, K-Means
Clustering, Hierarchical Clustering, Anomaly detection and large scale
machine learning.

Total 30
Teaching and Evaluation Scheme
Program: B. Tech. CSDS Semester : II
Course/Module : Machine Learning Module Code:

Teaching Scheme Evaluation Scheme


Term End Examinations
Lecture (Hours Practical (Hours Tutorial Credit Internal (TEE)
per week) per week) (Hours per Continuous (Marks- 100 in Question
week) Assessment (ICA) Paper)
(Marks - 50)

2 2 0 3 Marks Scaled to Marks Scaled


50 to 50
What is Learning??
• Learning is a process that improves
the knowledge of an AI program by
making observations about its
environment.
• To understand the different types of AI
learning models, we can use two of the
main elements of human learning
processes:
• Knowledge- From the knowledge
perspective, learning models can be
classified based on the representation of
input and output data points.
• Feedback- AI learning models can be
classified based on the interactions with the
outside environment, users and other
external factors.

Difference Between AI, ML and DL


Existence of AI, ML and Deep Learning
What is Machine Learning??
Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to “self-
learn” from training data and improve over time, without being explicitly programmed. Machine
learning algorithms are able to detect patterns in data and learn from them, in order to make
their own predictions.
State of the Art Applications for ML
What are the steps involved in building Machine
Learning models?
Any machine learning model development can broadly be divided into six
steps:
•Problem definition involves converting a Business Problem to a
machine learning problem
•Hypothesis generation is the process of creating a possible
business hypothesis and potential features for the model
•Data Collection requires you to collect the data for testing your
hypothesis and building the model
•Data Exploration and cleaning helps you remove outliers, missing
values and then transform the data into the required format
•Modeling is where you actually build the machine learning
models
•Once built, you will deploy the models
Supervised Learning
• Supervised learning, as the name indicates, has the presence of a supervisor as a teacher.
• Basically supervised learning is when we teach or train the machine using data that is well labeled.
• Means some data is already tagged with the correct answer.
• After that, the machine is provided with a new set of examples(data) so that the supervised learning algorithm
analyses the training data(set of training examples) and produces a correct outcome from labeled data.
Unit-1 Types of Learning.docx
Supervised Learning Process: Two Steps

 Learning (training): Learn a model using the training data

 Testing: Test the model using unseen test data to assess the model accuracy

Number of correct classifications


Accuracy  ,
Total number of test cases

CS583, BING
20
LIU, UIC
What do we mean by Learning?
• Given

• a data set D,

• a task T, and

• a performance measure M,

a computer system is said to learn from D to perform the task T if after learning the system’s
performance on T improves as measured by M.

• In other words, the learned model helps the system to perform T better as compared to no
learning.
An Example
• Data: Loan application data
• Task: Predict whether a loan should be approved or not.
• Performance measure: accuracy.

No learning: classify all future applications (test data) to the majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
• We can do better than 60% with learning.
Fundamental Assumption of Learning
Assumption: The distribution of training examples is identical to the distribution of test
examples (including future unseen examples).

• In practice, this assumption is often violated to certain degree.


• Strong violations will clearly result in poor classification accuracy.
• To achieve good accuracy on the test data, training examples must be sufficiently
representative of the test data.
Steps Involved in Supervised Learning:
•First Determine the type of training dataset
•Collect/Gather the labelled training data.
•Split the training dataset into training dataset, test dataset, and validation dataset.
•Determine the input features of the training dataset, which should have enough knowledge so that the
model can accurately predict the output.
•Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.
•Execute the algorithm on the training dataset. Sometimes we need validation sets as the control
parameters, which are the subset of training datasets.
•Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output, which
means our model is accurate.
Types of Supervised Learning
• Supervised learning can be further divided into two types of problems:
Unsupervised Learning
• Unsupervised learning is the training of a machine using information that is neither classified
nor labeled.
• It allows the algorithm to act on that information without guidance.
• Here the task of the machine is to group unsorted information according to similarities, patterns,
and differences without any prior training of data.
• Unlike supervised learning, no teacher is provided that means no training will be given to the
machine. Therefore the machine is restricted to find the hidden structure in unlabeled data by
itself.
Reinforcement Learning
• Reinforcement learning is an area of Machine Learning.
• It is about taking suitable action to maximize reward in a particular situation.
• It is employed by various software and machines to find the best possible behavior or path it
should take in a specific situation.
• Reinforcement learning differs from supervised learning in a way that in supervised learning
the training data has the answer key with it so the model is trained with the correct answer
itself.
• Whereas in reinforcement learning, there is no answer but the reinforcement agent decides
what to do to perform the given task. In the absence of a training dataset, it is bound to learn
from its experience.
Terminologies Used in Reinforcement Learning

Agent – is the sole decision-maker and learner


Environment – a physical world where an agent learns and decides the actions to be performed
Action – a list of action which an agent can perform
State – the current situation of the agent in the environment
Reward – For each selected action by agent, the environment gives a reward. It’s usually a scalar value
and nothing but feedback from the environment
Policy – the agent prepares strategy(decision-making) to map situations to actions.
Value Function – The value of state shows up the reward achieved starting from the state until the
policy is executed
Model – Every RL agent doesn’t use a model of its environment. The agent’s view maps state-action pairs
probability distributions over the states
Reinforcement Learning Workflow

Reinforcement Learning Workflow

– Create the Environment


– Define the reward
– Create the agent
– Train and validate the agent
– Deploy the policy
Semi Supervised Learning
• Where an incomplete training signal is given: a training set with some (often many) of the
target outputs missing.
• There is a special case of this principle known as Transduction where the entire set of problem
instances is known at learning time, except that part of the targets are missing.
• Semi-supervised learning is an approach to machine learning that combines small labeled
data with a large amount of unlabeled data during training. Semi-supervised learning falls
between unsupervised learning and supervised learning.
What are some of the latest achievements and
developments in machine learning?
Some of the latest achievements of machine learning include:
•Winning DOTA2 against the professional players (OpenAI’s development)
• Beating Lee Sidol at the traditional game of GO (Google DeepMind’s algorithm)
• Google saving up to 40% of electricity in its data centers by using Machine Learning
• Writing entire essays and poetry, and creating movies from scratch using Natural Language
Processing (NLP) techniques (Multiple breakthroughs, the latest being OpenAI’s GPT-2)
• Creating and generating images and videos from scratch (this is both incredibly creative and
worryingly accurate)
• Building automated machine learning models. This is revolutionizing the field by expanding the
circle of people who can work with machine learning to include non-technical folks as well
• Building machine learning models in the browser itself! (A Google creation – TensorFlow.js)
What are some of the Challenges in the
ad0ption of Machine Learning?
While machine learning has made tremendous progress in the last few years, there are some big challenges that
still need to be solved. It is an area of active research and I expect a lot of effort to solve these problems in the
coming time.
Huge data required: It takes a huge amount of data to train a model today. For example – if you want to
classify Cats vs. Dogs based on images (and you don’t use an existing model) – you would need the model to be
trained on thousands of images. Compare that to a human – we typically explain the difference between Cat and
Dog to a child by using 2 or 3 photos
High compute required: As of now, machine learning and deep learning models require huge computations
to achieve simple tasks (simple according to humans). This is why the use of special hardware including GPUs
and TPUs is required. The cost of computations needs to come down for machine learning to make a next-level
impact
Interpretation of models is difficult at times: Some modeling techniques can give us high accuracy but
are difficult to explain. This can leave the business owners frustrated. Imagine being a bank, but you cannot tell
why you declined a loan for a customer!
New and better algorithms required: Researchers are consistently looking out for new and better
algorithms to address some of the problems mentioned above
More Data Scientists needed: Further, since the domain has grown so quickly – there aren’t many people
with the skill sets required to solve the vast variety of problems. This is expected to remain so for the next few
years. So, if you are thinking about building a career in machine learning – you are in good stead!

You might also like