You are on page 1of 32

Introduction to Machine

Learning

PREPARED BY: LUCY FELIX-SADIWA


Image retrieved from https://www.javatpoint.com/subsets-of-ai
What is Data Science
• Data science combines multiple fields, including statistics, scientific
methods, artificial intelligence (AI), and data analysis, to extract
value from data. Those who practice data science are called data
scientists, and they combine a range of skills to analyze data
collected from the web, smartphones, customers, sensors, and
other sources to derive actionable insights. (retrieved from
https://www.oracle.com/ph/data-science/what-is-data-science/)
Artificial Intelligence

Machine
Learning
Neural
Networks

Deep
Learning
Data Science Data
Mining

Big Data
Analytics
• Analytics is the process of discovering, interpreting, and communicating significant
patterns in data. Quite simply, analytics helps us see insights and meaningful data that we
might not otherwise detect. Business analytics focuses on using insights derived from data
to make more informed decisions that will help organizations increase sales, reduce costs,
and make other business improvements.
https://www.oracle.com/ph/business-analytics/what-is-analytics/
• 4 Types
• Descriptive Analytics - what happened in the past
• Diagnostic Analytics - why something happened in the past
• Predictive Analytics - which predicts what’s most likely to happen in the
future
• Prescriptive Analytics - which recommends actions you can take to affect
those likely outcomes
Descriptive Analytics
• Any activity or method that helps us to describe or summarize raw data into
something interpretable by humans
• Allow us to learn from past behaviors and understand how they might influence future
outcomes
• Examples:
• Company’s business intelligence reports that cover different aspect of the organization to provide
historical insights regarding the company’s production, operations, sales, revenue, financials, inventory,
customers, and market share
• The sales team can learn which customer segments generated the highest peso amount in sales last year.
• The marketing team can uncover which social media platforms delivered the best return on advertising
investment last quarter.
• The finance team can track month-over-month and year-over-year revenue growth or decline.
• Operations can track demand for SKUs (Stock Keeping Units) across geographic locations throughout the
past year.
Diagnostic Analytics
• Examines data or information to answer the question “Why did it happen?”
• Techniques: Drill-down, data discovery, data mining, correlations, causations
• Provides a very good understanding of a limited piece of the problem you want to solve
• Labor intensive – human intervention is required to perform drill-down or data mining
to go deeper into the data to understand why something happened or its root cause. It
focuses on determining the factors and events that contributed to the outcome.
• Examples:
• Decline in sales of a product line on some stores, product manager may want to look backward to
review past trends and patterns for the product line sales across different stores base on its
placement (floor, corner, aisle) within the store. The manager may look at external factors such as
demographic, season, and other factors
Predictive Analytics
• Ability to make predictions or estimations of likelihood about unknown future
events based on the past or historic patterns.
• Give insights into “What might happen?”
• Uses techniques from data mining, statistics, modeling, machine learning, and
AI to analyze current data to make predictions about the future.
• The foundation of predictive analytics is based on probabilities, and the
quality of predictions by statistical algorithms depends a lot on the quality of
input data. 100% Accuracy
• Examples: Weather Forecasting, e-mail spam identification, fraud detection,
probability of customer purchasing a product or renewal of insurance policy,
predicting the chances of a person with known illness, etc.
Prescriptive Analytics
• Area of data or business analytics dedicated to finding the best course of action for
a given situation.
• Endeavors to measure the future decision’s effect to enable the decision makers to
foresee the possible outcomes before the actual decisions are made.
• Combination of business rules, machine learning algorithms, tools that can be
applied against historic and real-time data feed.
• Key objective: not just to predict what will happen but also why it will happen by
predicting multiple futures based on different scenarios to allow companies to
assess possible outcomes base on their actions.
• Examples: simulations in design situations to help users identify system behaviors
under different configurations and ensuring that all key performance metrics are
met such as wait times, queue length, etc.
What is Machine Learning?
• Arthur Samuel (1959)
• Machine Learning is a field of study that gives computers the ability to learn without being explicitly
programmed
• Tom Mitchel(1997)
• A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience
E.
• Machine learning is a field of computer science that involves using statistical methods to
create programs that either improve performance over time, or detect patterns in
massive amounts of data that humans would be unlikely to find.
• Machine learning explores the study and construction of algorithms that can learn from
and make predictions on data. Such algorithms operate by building a model from
example inputs in order to make data driven predictions or decisions rather than
following strictly static program instructions
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance
at tasks in T, as measured by P, improves with experience E.

• Suppose your email program watches which emails you do or do not


mark as spam, and based on that learns how to better filter spam.
What is the task T in this setting?
A. Classifying emails as spam or not spam
B. Watching you label emails as spam or not spam
C. The number (or fraction) of emails correctly classified as spam/not spam.
D. None of the above – This is not a machine learning problem.
• How the driverless car sees the world (https://www.youtube.com/watch?v=tiwVMrTLUWg&t=754s)
• Video Recommendations from youtube
• Posts, Ads, videos on social media like facebook.
Categories of
Machine Learning
Image retrieved from https://vitalflux.com/great-mind-maps-for-learning-machine-learning/
Supervised Learning
• The machine learning algorithm is provided with large enough
example input dataset and respective output or event/class, usually
prepared in consultation with the subject matter expert of a
respective domain.
• Goal: Learn patterns in the data and build general rules to map input
to the output, class or event.
• 2 Types
• Regression - The output to be predicted is a continuous number in relevance
with a given input dataset.
• Classification – The output to be predicted is the actual or the probability of an
event/class and the number of classes to be predicted (2 or more)
(Loukas, 2020) retrieved from https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
Example by Andrew Ng on Machine Learning Course
Example by Andrew Ng on Machine Learning Course
Example by Andrew Ng on Machine Learning Course
Unsupervised Learning
• Study the patterns in the input dataset to get better understanding and identify
similar patterns that can be grouped into specific classes or events. It does not
require any intervention from the subject matter experts beforehand
• Examples of Unsupervised Learning
• Clustering - The goal is to divide the input dataset into logical groups of related items.
Examples: grouping news articles, grouping customers base on their profile, etc.
• Dimensionality Reduction – The goal is to simplify a large input dataset by mapping them to a
lower dimensional space. Example: Doing Analysis on large dimension dataset, you may want to
find the key variables that hold significant percentage (say 95%) of information, and only use
them for analysis.
• Anomaly Detection – aka Outlier Detection, is the identification of items or observations which
do not conform to an expected pattern or behavior in comparison with other items in a given
dataset. Examples: machine or system health monitoring, event detection, fraud/intrusion
detection. It’s a big area of internet of things to enable detection of abnormal behavior in a
given context.
(Loukas, 2020) retrieved from https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
Reinforcement Learning - Map situations to actions
that yield the maximum final reward.
• Not only the
immediate reward
but also the next and
all subsequent
rewards.
• Errors as rewards or
penalties
• If error is big, then
the penalty is high
and the reward low
• Reward feedback is
required for the
model to learn which
action is best and this
is known as
“the reinforcement si
gnal”.
(Loukas, 2020) retrieved from https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
Workflow of Machine Learning Project

(Pant, 2019) retrieved from https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94


Machine Learning Workflow
(PANT, 2019) RETRIEVED FROM
HTTPS://TOWARDSDATASCIENCE.COM/WORKFLOW-OF-A-MACHINE- (MAHESHWARI, 2018) RETRIEVED FROM
LEARNING-PROJECT-EC1DBA419B94 HTTPS://MEDIUM.DATADRIVENINVESTOR.COM/MACHINE-
LEARNING-PROJECT-WORKFLOW-8137A401ED81

• Gathering data • Gathering the data.

• Preparation of Data.
• Data pre-processing • EDA (Exploratory Data Analysis).

• Researching the model that • Feature Engineering Selection.

will be best for the type of data • Choosing the best model.

• Training our model.


• Training and testing the model
• Evaluating the model.
• Evaluation • Performing Hyper Parameter Tuning on the model.

• Interpreting the model results.


Knowledge Discovery Databases
Cross-Industry
Standard
Process for
Data Mining
(CRISP-DM)

Analytics
Solution Unified
Method for
Data Mining/
Predictive
Analytics
(ASUM-DM)
Python’s Data Analysis Packages
• Numpy - Core library for scientific computing. Its built-in mathematical
functions enable lightning-speed computation and can support
multidimensional data and large matrices. It is also used in linear algebra.
NumPy Array is often used preferentially over lists as it uses less memory
and is more convenient and efficient.
• Scikit-Learn - one of the most used machine learning libraries in Python.
Built on NumPy, SciPy, and Matplotlib
• Matplotlib - an extensive library for creating fixed, interactive, and
animated Python visualizations.
• Pandas -. It is primarily used for data analysis, data manipulation, and
data cleaning.
Commonly used Algorithms

CLASSIFICATION REGRESSION
• K-Nearest Neighbor • Linear Regression
• Naive Bayes • Support Vector Regression
• Decision Trees/Random Forest • Decision Tress/Random Forest
• Support Vector Machine • Gaussian Progresses
Regression
• Logistic Regression
• Ensemble Methods

(Pant, 2019) retrieved from https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94

You might also like