You are on page 1of 74

MICE-5002: Advanced Machine

Learning

Instructed by
Md. Manowarul Islam
Associate Professor, Dept. of Computer Science & Engineering
Jagannath University, Dhaka-1100
Today’s topics

Syllabus
Introducing

Exam
Book

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Exam Attendance 10
Assignment 10
Class Test 10
Mid Exam 20
Final Exam 50
Total 100

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Syllabus
• Introduction
• Data pre-processing
• Association rule mining
• Classification (supervised learning)
• Clustering (unsupervised learning)
• Post-processing of data mining results
• Text mining
• Partial/Semi-supervised learning
• Introduction to Web mining

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Outline
•Artificial Intelligence
•Data Science
•Machine Learning

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What is knowledge?

❖ Knowledge is a theoretical or practical understanding


of a subject or a domain.
❖ Knowledge is the sum of what is currently Known.

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What is Intelligence?

❖ Someone’s intelligence is their ability to understand and learn things.

❖ Intelligence is the ability to think and understand instead of doing things by


instinct or automatically.

❖ It gives some flexibility. It does not specify whether it is someone or something that has
the ability to think and understand.

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What is Thinking?

❖ Thinking is the activity of using your brain to consider a problem or to


create an idea.
❖ In order to think, someone or something has to have a brain – an
organ that enables someone or something to learn and understand
things, to solve problems and to make decisions.

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Intelligence?

❖ So we can now define Intelligence, is the ability to learn and


understand, to solve problems and to make decisions.

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Intelligence Machine?
Turing Imitation game

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Artificial Intelligence
• Homo sapiens—man the wise—because our intelligence is so
important
• For thousands of years, we have tried to understand how we think
and act—
• how our brain, a mere handful of matter, can perceive, understand, predict,
and manipulate a world far larger and more complicated than itself.
• Artificial intelligence (AI) is branch of CS concerned with not just
understanding but also building intelligent entities—
• machines that can compute how to act effectively and safely in a wide variety
of novel situations

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Brain Vs Machine/Computer
❑ Machines are simulating the operation of Human Brain
❑ A computer can do some things better than a human can. For Example –
❑ Adding a thousand four-digit numbers
❑ Drawing complex, 3D images
❑ Store and retrieve massive amounts of data
❑ However, there are things humans can do much better, i.e. Creative writing, Recognition
and so on.
❑ We can create computer intelligence through programming just as people become
intelligent by learning

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What is AI?
Views of AI fall into four categories:

1 2 3 4

Thinking Acting Thinking Acting


humanly humanly rationally rationally

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What is AI?
(Thinking Humanly)

❖A given program must have some way of


1 determining how human thinks. We need to get
inside the actual working of human minds. There
Thinking are two ways to do this –
humanly: • Through introspection
The cognitive Trying to catch our own thoughts as they go by.
Modeling • Through psychological experiments
Approach
A child’s cognitive development.

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What is AI?
(Acting Humanly)

2 Intelligent machines must possess the


following capabilities –
Acting •Natural Language processing
humanly:
•Knowledge representation
The Turing •Automated reasoning
Test Approach •Machine learning
To pass the total Turing test, the intelligent
machine will need-
•Computer vision to perceive objects
•Robotics to manipulate objects and move
about

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What is AI?
(Thinking Rationally)

3 ❖ Syllogisms
Socrates is a man;
Thinking
rationally:
All men are mortal
Therefore Socrates is mortal.
The laws of ❖ Logic
thought These laws of thought were supposed to
approach
govern the operation of mind.
Their study initiated the field called Logic

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What is AI?
(Acting Rationally)

4
Rational behavior: doing the right thing
Acting
rationally: The right thing: that which is expected to
maximize goal achievement, given the
The rational
agent approach available information

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Artificial Intelligence (cont…)
• Four approaches of AI
• Acting humanly
• natural language processing to communicate successfully in a human language;
• knowledge representation to store what it knows or hears;
• automated reasoning to answer questions and to draw new conclusions;
• machine learning to adapt to new circumstances and to detect and extrapolate patterns
• Thinking humanly
• introspection—trying to catch our own thoughts as they go by;
• psychological experiments—observing a person in action;
• brain imaging—observing the brain in action.
• Thinking rationally
• logic, probability
• Acting rationally
• agent
Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Artificial Intelligence(cont…)
• Applications of AI

Source: https://leverageedu.com/blog/applications-of-artificial-intelligence/

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Artificial Intelligence(cont…)
• AI in COVID-19 pandemic

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Artificial Intelligence(cont…)
• Impact of AI on employment

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Artificial Intelligence(cont…)
• Major Areas of AI

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Introduction to data
• Example:
10, 25, …, Kharagpur, 10CS3002,
namo@gov.in
Anything else?

• Data vs. Information


100.0, 0.0, 250.0, 150.0, 220.0, 300.0, 110.0

Is there any information?

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


How large your data is?
• What is the maximum file size you have dealt so
far?
• Movies/files/streaming video that you have used?

• What is the maximum download speed you get?


• To retrieve data stored in distant locations?

• How fast your computation is?


• How much time to just transfer from you, process and
get result?

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Growth of data

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Sources of data
• “Every day, we create 2.5 quintillion bytes of data
• So much that 90% of the data in the world today has been created in the last two years alone.

• The data come from several sources


• sensors used to gather climate information
• posts to social media sites,
• digital pictures and videos
• purchase transaction records
• cell phone GPS signals

etc. …… to name a few!

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Who is generating Big Data?
Social User Tracking & Homeland Security
Engagement

eCommerce Financial Services Real Time Search

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Data Rich but Information Poor

Databases are too big

Data Mining can help discover


knowledge

Terrorbyte
s
What can you do with the data?
• Suppose that you are the owner of a supermarket and you have
collected billions of market basket data. What information would you
extract from it and how would you use it?

Product placement

Catalog creation

Recommendations
• What if this was an online store?

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What can you do with the data?
• Suppose you are biologist who has microarray
expression data: thousands of genes, and their
expression values over thousands of different settings
(e.g. tissues). What information would you like to get
out of your data?

Groups of genes and tissues

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Why Mine Data? Commercial Viewpoint
• Lots of data is being collected
and warehoused
• Web data, e-commerce
• purchases at department/
grocery stores
• Bank/Credit Card
transactions
• Computers have become cheaper and more powerful
• Competitive Pressure is Strong
• Provide better, customized services for an edge (e.g. in Customer
Relationship Management)

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Why Mine Data? Scientific Viewpoint
• Data collected and stored at
enormous speeds (GB/hour)
• remote sensors on a satellite
• telescopes scanning the skies
• microarrays generating gene
expression data
• scientific simulations
generating terabytes of data
• Traditional techniques infeasible for raw data
• Data mining may help scientists
• in classifying and segmenting data
• in Hypothesis Formation

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Data Science
• Field of study that uses mathematics, statistics, programming, and
domain knowledge to extract meaningful insights from data

Source: https://becominghuman.ai/top-data-science-applications-how-data-science-bought-change-to-the-world-e215c3b25d9d
Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Data science (cont….)
• Who is a Data Scientist?
• Crack hidden problems with their strong expertise in a certain scientific
discipline using mathematics, statistics, and computer science.

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Data Science (cont…)
• Important Applications and Examples
• Healthcare: Healthcare companies are using data science to build sophisticated
medical instruments to detect and cure diseases.
• Gaming: Video and computer games are now being created with the help of data
science and that has taken the gaming experience to the next level.
• Image Recognition: Identifying patterns in images and detecting objects in an image
is one of the most popular data science applications.
• Recommendation Systems: Netflix and Amazon give movie and product
recommendations based on what you like to watch, purchase, or browse on their
platforms.
• Logistics: Data Science is used by logistics companies to optimize routes to ensure
faster delivery of products and increase operational efficiency.
• Fraud Detection: Banking and financial institutions use data science and related
algorithms to detect fraudulent transactions.

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Data Science (cont…)
• Role of data science in covid-19 pandemic
• Predicting COVID-19 Trends and Hotspots
• AI-driven Informatics, Sensing, Imaging and Big Data Analytics for Fighting the
COVID-19 Pandemic

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Data Science (cont…)
• Languages of data science
• Python, R, and SQL are the languages that consider first and foremost;
• there are so many others that have their own strengths and features, e.g.,
Scala, Java, C++, and Julia are some of the most popular.
• Tools used in data science

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Machine Learning (ML)
• What is learning?
• The acquisition of knowledge or skills through study, experience, or being
taught
• The process of acquiring new, or modifying existing, knowledge, behaviors,
skills, values, or preferences
• In other words, the process of converting experience into expertise or knowledge

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What is Machine Learning?
• [Arthur Samuel, 1959]
• Field of study that gives computers
• the ability to learn without being explicitly programmed

• [Kevin Murphy] algorithms that


• automatically detect patterns in data
• use the uncovered patterns to predict future data or other outcomes of interest

• [Tom Mitchell] algorithms that


• improve their performance (P)
• at some task (T)
• with experience (E)

(C) Dhruv Batra 40


Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
What is Machine Learning?
• If you are a Scientist
Machine
Data Understanding
Learning

• If you are an Engineer / Entrepreneur


• Get lots of data
• Machine Learning
• ???
• Profit!

(C) Dhruv Batra 41


Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Why Study Machine Learning?
• Develop systems
• too difficult/expensive to construct manually
• because they require specific detailed skills/knowledge
• knowledge engineering bottleneck

• Develop systems
• that adapt and customize themselves to individual users.
• Personalized news or mail filter
• Personalized tutoring

• Discover new knowledge from large databases


• Medical text mining (e.g. migraines to calcium channel blockers to magnesium)
• data mining

42
Md. Manowarul Islam, Dept. of CSE,Ray
Slide Credit: Jagannath
Mooney University, Dhaka-1100
Traditional Programming

int square(int x)
{
return x*x;
}

Not an efficient approach when


mathematical description of rules
becomes complex!

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Write a program to play Rock-Paper-Scissors
Game!

Write a program for speech to text conversion.

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


What is Machine Leaning?
Not an efficient
int square(int x) approach when rules
{ (mathematical
return x*x; description) becomes
} complex!

Column_1 Coloumn_2
Training 2 4
Set 11 144
25 625

Test 7 ?
set 21 ?

• Definition:
“Field of study that gives computers the ability to learn without being explicitly
programmed.” Arthur Samueal (1959)
Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Why ML?
• There is no need to “learn” to calculate payroll
• Learning is used when:
• Tasks Performed by Animals/Humans e.g., driving, speech recognition, and
image understanding, etc.

• Tasks beyond Human Capabilities e.g., astronomical data, turning medical


archives into medical knowledge, weather prediction, analysis of genomic
data, Web search engines, and electronic commerce, etc.

• Adaptability i.e., changing the inputs will produce desired output

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Problems ML Can Solve
• Identifying the zip code from handwritten digits on an envelope
• Here the input is a scan of the handwriting, and the desired output is the actual
digits in the zip code. To create a dataset for building a machine learning model, you
need to collect many envelopes. Then you can read the zip codes yourself and store
the digits as your desired outcomes.
• Determining whether a tumor is benign (not cancerous) based on a medical
image
• Here the input is the image, and the output is whether the tumor is benign. To
create a dataset for building a model, you need a database of medical images. You
also need an expert opinion, so a doctor needs to look at all of the images and
decide which tumors are benign and which are not. It might even be necessary to
do additional diagnosis beyond the content of the image to determine whether the
tumor in the image is cancerous or not
• Detecting fraudulent activity in credit card transactions

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Types of ML

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Supervised and unsupervised learning:
• Supervised learning:
• Given a set of features/label pairs, find a rule that predicts the label
associated with a previously unseen pair.
Examples-
• Regression
• Classification

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Classification
Name Matches Runs_scored Wickets_taken Label

Shikhar Dhawan 136 5688 0 Batsman Name Matches Runs_scored Wickets_taken Label

Rohit Sharma 224 9115 8 Batsman Vijay Shankar 12 223 4 ?

K. L. Rahul 32 1239 0 Batsman Dinesh Karthik 94 1752 0 ?


Virat Kohli 248 11867 0 Batsman

M. S. Dhoni 350 10773 1 Batsman

Kedar Jadhav 73 1389 27 Batsman

Bhuvneshwar
114 526 132 Bowler
Kumar

Mohammed
77 147 114 Bowler
Shami

Jasprit Bumrah 64 19 104 Bowler

Kuldeep Yadav 60 118 104 Bowler

Yuzvendra
52 49 91 Bowler
Chahal
Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Supervised Learning

Classification

x Classification y Discrete

(C) Dhruv Batra 55


Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Image Classification
• Im2tags; Im2text
• http://deeplearning.cs.toronto.edu/

Pizza
Wine
Stove

(C) Dhruv Batra 56


Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Face Recognition

http://developers.face.com/tools/

(C)Md.
Dhruv Manowarul Islam, Dept.
SlideofCredit:
CSE, Noah
Jagannath University, Dhaka-1100
Batra
Snavely 57
Machine Translation

(C) Dhruv Batra 58


Figure Credit: Kevin Gimpel
Supervised Learning

Regression
x Regression y Continuous

(C) Dhruv Batra 59


Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Stock market

(C)Md.
Dhruv Manowarul
Batra Islam, Dept. of CSE, Jagannath University, Dhaka-1100 60
Weather prediction

Temperature

(C)Md.
Dhruv Manowarul
Batra Islam, Dept. of CSE,Carlos
Jagannath University, Dhaka-1100
Slide Credit: Guestrin 61
Supervised learning algorithms
• k-Nearest Neighbors: k-Neighbors classification, k-neighbors
regression
• Linear Models: Linear models for regression or logistic regression
• Naive Bayes Classifiers
• Decision Trees
• Random forests
• Support Vector Machines (SVM)
• Neural Networks (Deep Learning)

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Supervised and unsupervised learning:
• Supervised learning:
• Given a set of features/label pairs, find a rule that predicts the label associated with
a previously unseen pair.
Examples-
• Regression
• Classification

• Unsupervised learning:
• Given a set of feature vectors (without lebels) group them into ‘natural clustors’ (or
create label for groups)
Example-
• Clustering
• Dimensionality Reduction

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Unsupervised Learning

Clustering
x Clustering y Discrete

Unsupervised Learning
Y not provided

(C) Dhruv Batra 64


Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Clustering Name Matches Runs_scored Wickets_taken

Shikhar Dhawan 136 5688 0


Rohit Sharma 224 9115 8
K. L. Rahul 32 1239 0
Virat Kohli 248 11867 0

M. S. Dhoni 350 10773 1

Kedar Jadhav 73 1389 27


Vijay Shankar 12 223 4

Dinesh Karthik 94 1752 0

Hardik Pandya 54 957 54


Ravindra Jadeja 165 2296 187
Bhuvneshwar Kumar 114 526 132
Mohammed Shami 77 147 114
Jasprit Bumrah 64 19 104
Kuldeep Yadav 60 118 104
Yuzvendra Chahal 52 49 91

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Clustering

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Clustering Data: Group similar things

(C) Dhruv Batra


Slide Credit: Carlos Guestrin 67
Face Clustering

iPhoto

Picassa

(C) Dhruv Batra 68


Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Unsupervised learning
• No known output, no teacher to instruct the learning algorithm
• the learning algorithm is just shown the input data and asked to extract knowledge
from this data
• Clustering and association are unsupervised
• The problems are unsupervised
• Identifying topics in a set of blog posts
• If you have a large collection of text data, you might want to summarize it and find prevalent
themes in it. You might not know beforehand what these topics are, or how many topics there
might be. Therefore, there are no known outputs.
• Segmenting customers into groups with similar preferences
• Given a set of customer records, you might want to identify which customers are similar, and
whether there are groups of customers with similar preferences. For a shopping site, these
might be “parents,” “bookworms,” or “gamers.” Because you don’t know in advance what
these groups might be, or even how many there are, you have no known outputs
• Detecting abnormal access patterns to a website
Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Application of Machine Learning
• Computation Biology
(Structure learning)
• Animation and Control
• Tracking and activity
recognition

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


Application of Machine Learning
• Application in speech and
Natural Language processing
• Probabilistic Context Free
Grammars

• Graphical Models
• Social network graph
analysis, causality
analysis
Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100
Challenges of Data Mining/ML
• Scalability
• Dimensionality
• Complex and Heterogeneous Data
• Data Quality
• Data Ownership and Distribution
• Privacy Preservation
• Streaming Data

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100


AI vs ML vs Data Science

Md. Manowarul Islam, Dept. of CSE, Jagannath University, Dhaka-1100

You might also like