You are on page 1of 10

Machine Learning: Machine Learning is the science (and art) of

programming computers so they learn from data.


Machine Learning is the field of study that gives computers the
ability to learn without being explicitly programmed.

A computer program is said to learn from experience E with respect


to some task T and some performance measure P, if its
performance on T as measured by P, improves with Experience E.
Spam Filter- A Machine Learning program uses historical data.
Feature/ Independent Var Target Variable or Label
Email-1 Spam
Email-2 Ham
Email-3 Ham
Email-4 Spam
Email-5 Ham

Task- T- Is to flag new email as spam/ham


Experience-E- is the training data on which model is built
Performance measure P- How accurately incoming email is
classified as Spam or Ham.
ML Engineering- To create and automate the model pipeline and
deploy the model into production so that it can be used real-time.

Data Analytics- Python, Visualization tool (Power BI, Tableau), SQL,


PySpark (Big Data), Cloud Computing (Azure, GCP, AWS)
Data Science- Python, Visualization tool (Power BI, Tableau), SQL,
PySpark (Big Data), Cloud Computing (Azure, GCP, AWS), Statistics,
Machine Learning.

Applications of Machine Learning:


1. Analysing images of products on a production line to
automatically classify them. This is a use case of Image
Classification- CNN (Convolutional Neural Networks)
2. Automatically Classifying News Articles- Text Classification-
NLP- Natural Language Processing
3. Automatically flagging offensive comments on discussion
forums- NLP- Text Classification
4. Summarizing long documents automatically- Text
Summarization- NLP
5. Creating a chatbot or a personal assistant- NLP and NLU-
Natural Language Understanding
6. Forecasting your company’s revenue next year, based on
many performances metrics
7. Detecting Credit Card Fraud- This is the application for
anomaly detection.
8. Segmenting clients based on their purchases so that you can
design a different marketing strategy for each segment-
Clustering (KMeans)
9. Recommending a product that a client may be interested in
based on past purchases- Recommendation engine.

Types of Machine Learning Systems


1. How they are supervised during training (supervised or
unsupervised).
2. Whether or not they can learn incrementally on the fly (online
versus batch learning)

Training Supervision-
1. Supervised Learning- In supervised learning, the training set
you feed to the algorithm includes the desired solutions,
called labels.
a. Regression (continuous)
b. Classification (discrete)

Unsupervised Learning: In unsupervised learning, the training data


is unlabelled.
Customer Income Spending
1 100K 50K
2 90K 40K
3 150K 100K
4 80K 60K
5 120K 90K
The learning system, called an agent or robot can observe the
environment, select and performs actions and get rewards or
penalties.
Batch Versus Online Learning
Batch Learning- In batch learning, the system is incapable of
learning incrementally, it must be trained using all the available
data.
50000- batch learning

Model rot or Data Drift- Performance of the machine learning


model goes down over a period of time.
To avoid such scenarios, keep updating your model on regular
basis.
Online Learning: In online learning, you train the system or model
incrementally by feeding it data instances sequentially. Either
individually or in small groups called mini-batches.

Main Challenges of Machine Learning


2. Insufficient Quantity of Training Data-
3. Non-representative training data
4. Poor Quality- Missing values or outliers
5. Overfitting- It means that the model performs well on the
training data, but it does not generalize well.
6. Underfitting- It occurs when the model is too simple to learn
the underlying structure of the data.

Testing and Validating


The only way to know how well a model will generalize to new
cases is to actually try it out on new cases.

Primary Data- Information that is collected for the first time from a
survey or an observational study.
Secondary Data- It is collected ad processed by some other agent
but the researcher uses it for his study.
Competitor Data- Secondary Data

Population and Sample


If we want to know the average height of males and females in
Chennai City- All males and females
Sample-A Sample is a small proportion of the population taken
from the population to study the characteristic of the population.
Sampling- Sampling is a technique adopted to select a sample. The
sampling method used for selecting a sample is important in
determining how closely the sample resembles the population
Probability Sampling
Non-Probability Sampling
Sampling Error

Types of Data-
Qualitative Data- Labels of categories- Discrete- Bar Diagram
Quantitative Data- Sales Price of Car over 1 year – Continuous-
Histogram

Measures of Central Tendency- It is to find one single figure to


describe whole of data.
Mean- Mathematical Average
Median- Positional Average- Middle value of the dataset
Mode
Arrange the data in ascending order and pick the middle value.

Relationship between Mean, Median and Mode

Mode= 3Median-2Mean

32 hrs
4 hrs Introduction
12 hrs Python
4 hrs Statistics
12 hrs Machine Learning Projects on various algorithms

You might also like