You are on page 1of 17

INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

1.- INFORMACIÓN

AREA/DEPARTMENT R&D PROJECT DATE: 29/SEPT/2021

CUSTOMER/SPONSOR Erick Rodríguez Martínez | CEO IDT LINK https://www.coursera.org/learn/machine-learning-with-


python/lecture/GjNfa/welcome
DISCUSSED ISSUE Course Notes | Machine Learning with python by IBM

2.- ASISTENTES:

# ID NAME POSITION # ID NAME POSITION

3.- AGENDA

1.

4.- NOTES

01 1. WELCOME W1

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

 In this module, you will learn about applications of Machine Learning in


different fields such as health care, banking, telecommunication, and so
on. You’ll get a general overview of Machine Learning topics such as
supervised vs unsupervised learning, and the usage of each algorithm.
Also, you understand the advantage of using Python libraries for
implementing Machine Learning models.

02  Learning Objectives
o Give examples of Machine Learning in various industries.
o Outline the steps machine learning uses to solve problems.
o Provide examples of various techniques used in machine learning.
o Describe the Python libraries for Machine Learning.
o Explain the differences between Supervised and Unsupervised
algorithms.
o Describe the capabilities of various algorithms.

03 2. INTRODUCTION TO MACHINE LEARNING W1.2

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

04  Machine learning is the subfield of computer science that gives W1.2


"computers the ability to learn without being explicitly programmed.”
Arthur Samuel

 There are many examples of Machine Learning applications, such as


Amazon, NetFlix, bank loan approval algorithms, telecommunications
companies and so on.

 Major Machine Learning Techniques


o Regression/Estimation technique
 Is used for predicting a continuous value. For example,
predicting things like the price of a house based on its
characteristics, or to estimate the Co2 emission from a car’s
engine.
o Classification technique
 Is used for Predicting the class or category of a case, for
example, if a cell is benign or malignant, or whether or not a
customer will churn
o Clustering groups or similar cases
 It can find similar patients or can be used for customer
segmentation in the banking field
o Association technique
 Is used for finding items or events that often co-occur, for
example, grocery items that are usually bought together by a
particular customer

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

o Anomaly detection
 Is used to discover abnormal and unused cases, for example, it
is used for credit card fraud detection
o Sequence mining
 Is used for predicting the next events, click-stream in websites
(Markov Model, HMM)
o Dimension reduction
 Is used to reduce the size of data (PCA)
o Recommendation systems
 This associates people’s preferences whit others who have
similar tastes and recommends new items to them such as books
or movies

 Which Machine Learning technique is proper for grouping of similar cases in a dataset, for example to find similar patients, or for
05 customers’ segmentation in a bank? W1.2
o Clustering

 Machine learning algorithms, inspired by the human learning process,


iteratively learn from data, and allow computers to find hidden insights
unlike statements, rules and methods programmed traditionally.
06 W1.2
 It detects without explicitly being programmed to do so. In essence,
machine learning follows the same process that a 4-year-old child uses
to learn, understand, and differentiate animals, things, etc.

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

 Artificial Intelligence
o AI Is a general field with broad scope including:
 Computer Vision
 Language processing
 Creativity
 Summarization
 Etc.

 Machine Learning
o ML Is a branch of AI that cover the statistical part of artificial
07 intelligence. It teaches the computer to solve problems by looking at W1.2
hundreds or thousand examples, learning for them, and then using
that experience to solve the same problem in new situations

 Deep Learning
o DP is a very special field of Machine Learning where computer can
actually learn and make intelligence decisions on their own. Deep
Learning involves a deeper level of automation in comparison with
most ML algorithms

08 3. PYTHON FOR MACHINE LEARNING W1.3

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

09  Python W1.3
o Python is a popular and powerful general purpose programming
language that recently emerged as a preferred language among
Data Scientist
 NumPy
o It´s a math library to work with N-Dimensional arrays in Python. It
enables you to do computation efficiently and effectively. It´s better
than regular Python because of its amazing capabilities working
with arrays, dictionaries, functions, datatypes, and to working with
images you need to know NumPy

 SciPy

o It´s a collection of numerical algorithms and toolboxes, including


signal process, optimization, statistics and much more. SciPy is a
good library for scientific and high performance computation

 MatPlotLib

o It´s a very popular plotting package that provides 2D Plotting, as


well 3D plotting

 RECOMMENDATION

o The basic knowledge of these three packages which are built on


Python, is a good asset for data scientist who want to work with
real-world problems

 Pandas

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

o It´s a very high-level python library that provides high


performance easy to use data structures. It has many functions for
data importing, manipulation and analysis. In particular, it offers
data structures and operations for manipulating numerical tables
and timeseries

 SciKit Learn

o It´s a collection of algorithms and tools for Machine Learning

 SciKit-Learn Library
o It´s a free Machine Learning library for the Python programming
language. it has most of the classification, regression, and clustering
algorithms, and it´s designed to work with python numerical and
scientific libraries [NumPy & SciPy]
o Most of the task that need to be done in a Machine Learning
pipeline are implemented already in SciKit-Learn including:
 Preprocessing of data
10  Feature selection W1.3
 Feature extraction
 Train test splitting
 Defining the algorithms
 Fitting models
 Tuning parameters
 Prediction
 Evaluation
 Exporting the model

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

 1 | Standardization of the dataset


 2 | Split dataset into Train and Test sets to train the model accuracy
 3 | Setup the algorithm [Example Support Vector Classification]
11  4 | Training the model
 5 | Predictions
 6 | Evaluate model accuracy [Example Confusion Matrix]
 7 | Save the model

 Why SciKit Learn is a proper library for Machine Learning?


12 o SciKit-Learn is a free Machine Learning that works with NumPy and SciPy
o SciKit-Learn has most of Machine Learning algorithms

13 4. SUPERVISED VS UNSUPERVISED LEARNING W1.4

 Supervised Vs Unsupervised Learning


o An easy way to being grasping the concept of Supervised and
Unsupervised Learning is by looking directly at the words that make
14 it up. Supervise, means to observe, and direct the execution of a W1.4
task, project or activity. Obviously we aren´t going to supervise a
person, instead will be supervising a Machine Learning model

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

 How do we supervise a Machine Learning model?


o We do this by teaching the model, that is we load the model with
knowledge so that we can have it predict future instances

15  How exactly do we teach a model? W1.4


o Using some data from labeled dataset. It´s important to note that
the data is labeled

 What does a labeled dataset look like?

o It could look something like the attached image [This example is


taken from a cancer dataset]

16  Components of this table W1.4


o Columns labels [Attributes]
o Columns data [Features]
o Rows [Observations]
o Data [Could be Numerical or Categorical]

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

 There are two supervised learning techniques


17 o Classification W1.4
o Regression

 Classification
18 W1.4
o It´s the process of predicting a discrete class labels or categories

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

 Regression
19 o It’s the process of predicting continuous values as opposed to W1.4
predicting categorical value in classification

 Which technique/s is/are considered as Supervised Learning?


20 o Classification W1.4
o Regression

21  Unsupervised Learning W1.4


o The model isn´t supervised, but we let the model work on its own to
discover information that may not be visible to the human eye
o The algorithms train on the dataset, and drawn conclusions on
unlabeled data
o It has more difficult algorithms than supervised learning since we
know little to no information about the data, or the outcomes that
are to be expected
 Most widely used unsupervised Machine Learning techniques
o Dimension reduction
 Feature selection. It reduces the redundant features to make a
classification easier

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

o Density estimation
 It´s a very simple concept that is mostly used to explore the data
to find some structure within it
o Market basket analysis
 It´s a modeling technique based upon the theory that if you buy
a certain group of items, you´re more likely to buy another group
of items
o Clustering
 …

 Clustering
o It is considered to be one of the most popular unsupervised
machine learning techniques, used for grouping data points or
objects that are somehow similar
22 o Cluster analysis has many applications in different domains, whether
it be a Bank´s desire to segment his customers based on certain
characteristics, or helping an individual to organize in-group his or
her favorite types of music

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

 Supervised Vs Unsupervised Learning


o So, to recap, the biggest difference between supervised and
unsupervised learning is that supervised learning deals with labeled
23 data while unsupervised learning deals with unlabeled data.
o In supervised learning, we have machine learning algorithms for
classification and regression. In unsupervised learning, we have
methods such as clustering

24  Quiz Week 1

25 1. REGRESSION W2

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

 In this module, you will get a brief intro to regression. You learn about
Linear, Non-linear, Simple and Multiple regression, and their
applications. You apply all these methods on two different datasets, in
the lab part. Also, you learn how to evaluate your regression model, and
calculate its accuracy.

 Learning Objectives
26 o Demonstrate understanding of the basics of regression. W2.1
o Demonstrate understanding of simple linear regression.
o Describe approaches for evaluating regression models.
o Describe evaluation metrics for determining accuracy of regression
models.
o Demonstrate understanding of multiple linear regression.
o Demonstrate understanding of non-linear regression.
o Apply Simple and Multiple, Linear Regression on a dataset for
estimation.

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

 Regression
o It’s the process of predicting continuous values as opposed to
predicting categorical value in classification.
o Regarding to the attached image we can use regression methods to
predict a continuous value such as co2 emission using some other
variables
27 o There exist two types of variables
 “Y” Dependent Variable: Can be seen as the state, target, or
final goal we study and try to predict, should be continuous and
cannot be a discrete value
 “X” Independent Variable or Variables: Also known as
explanatory variables, can be seen as the causes of those states,
can be measured on either a categorical or continuous
measurement scale

 Historical data of past cars using one or more of their features and from
28
that make a model.

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

 Simple Regression
o An independent variable is used to estimate a dependent variable.
 Multiple Regression
o When more than one independent variable is present to estimate a
29 dependent variable

 Depending on the relation between dependent and independent


variables both processes can be either linear or non-linear regression

30  Sales forecasting.
o Predict a salesperson’s total yearly sales from independent variables
such as age, education, and years of experience. It can also be used
in the
 Field of psychology
o Determine individual satisfaction, based on demographic and
psychological factors.
 Real state
o Predict the price of a house in an area, based on its size number of
bedrooms, and so on.
 Personal Recruitment
o Predict employment income for independent variables such as hours
of work, education, occupation, sex, age, years of experience, and so
on
 Other fields or domains,

Formato: 0101FR001
Vigencia: 30/Jun/2021
INGENIERÍA Y DESARROLLO TECNOLÓGICO

COURSE NOTES

o Finance, healthcare, retail, etc.

 We have many regression algorithms, each of them has its own


importance and a specific condition to which their application is best
31 suited. And while we've covered just a few of them in this course, it
gives you enough base knowledge for you to explore different
regression techniques.

Formato: 0101FR001
Vigencia: 30/Jun/2021

You might also like