DS - NLP

DATA
SCIENCE
NLP
By
ISRAR ALI
Basic Concepts
Terminologies, Motivation for the subject, Definitions,
Examples
❖ 1. Google: processes 24 peta bytes of data per day.
❖ 2. Facebook: 10 million photos uploaded every hour.
❖ 3. Youtube: 1 hour of video uploaded every second.
❖ 4. Twitter: 400 million tweets per day.
❖ 5. Astronomy: Satellite data is in hundreds of PB.
❖ 7. By 2020 the digital universe will reach 44
zettabytes..."
The Digital Universe of Opportunities: Rich Data and

the
Data
Increasing Value of the Internet of Things, April 2014.
everywhere!
That's 44 trillion gigabytes!
Data comes in different sizes and also flavors
(types):
Texts
Numbers
Graphs
Data types
Tables
Images
Transactions
Videos
Smile, we are
'DATAFIED'!
Wherever we go, we are “datafied".
Smartphones are tracking our locations.
We leave a data trail in our web browsing.
Interaction in social networks.
Privacy is an important issue in Data Science.
Internet of Things (IoT)
Lots of data
Lots of computation
Various types of communication
(machine-to-machine, machine-to-human)
The Data Science process
What is Data Mining?
Data mining focuses using machine

learning, pattern recognition and
statistics to discover patterns in data.
Interdisciplinary
field
What is Artificial Intelligence?
? What is Intelligence?
Intelligence is the computational part of the ability to achieve
goals in the world. Varying kinds and degrees of intelligence occur
in people and animals.
It is the science and engineering of making intelligent machines, especially

intelligent computer programs. It is related to the similar task of using
? What is Artificial Intelligence?
computers to understand human intelligence.
What is Artificial Intelligence?
Artificial Intelligence: some algorithm to enable computers to
perform actions we dene as requiring intelligence.
Examples:
❖ Search Based Heuristic Optimization
❖ Evolutionary computation (genetic algorithms)
❖ Logic Programming (inductive logic programming,
❖ fuzzy logic)
❖ Probabilistic Reasoning Under Uncertainty (Bayesian networks)
❖ Computer Vision
❖ Natural Language Processing
❖ Robotics
❖ Machine Learning
What is Learning?
“To gain knowledge or understanding of, or skill in by

study, instruction or experience''
Learning a set of new facts
Learning HOW to do something
Improving ability of something already learned
Simon's definition:
``Learning denotes changes in the system that are adaptive in

the sense that they enable the system to do the same task or
tasks drawn from the same population more effectively the
next time''
What is Machine Learning?
? Machine learning is a core sub-area of artificial intelligence;
? It enables computers to get into a mode of self-learning
without being explicitly programmed.
? When exposed to new data, these computer programs are
enabled to learn, grow, change, and develop by themselves.
Artificial Intelligence
(AI)
Graphical View
Machine
Learning
Neural Networks
Deep Learning/CNN
❖ Learning is used when:
❖ Human expertise does not exist

(navigating on Mars),
Why ❖ Humans are unable to explain their

expertise (speech recognition)
“Learn” ?
❖ Solution changes in time (routing on a
computer network)
❖ Solution needs to be adapted to

particular cases (user biometrics)
Why we need
AI/ML ?
The objective is to build an intelligent
system/agent which can think like human
and should be adoptive in nature.
To build an intelligent system we build

models. These models are based on
AI/ML techniques.
Models are trained to solve the problems

by using some experience/data
❖ Learning to recognize spoken words (Lee,
1989; Waibel, 1989).
❖ Learning to drive an autonomous vehicle
(Pomerleau, 1989).
❖ Learning to classify new astronomical
structures (Fayyad et al., 1995). Examples of
❖ Learning to play world-class backgammon Successful
(Tesauro 1992, 1995).
❖ the self-driving Google car,
Applications
❖ cyber fraud detection,
of Machine
❖ online recommendation engines—like friend
Learning
suggestions on Facebook, Netflix showcasing
the movies and shows you might like, and
“more items to consider” and “get yourself a
little something” on Amazon.
❖ Predicting Probability of Default for Banks
Summary
Learning
Tasks, Types, Methods, Applications
Classification
Regression (function
approximation, curve fitting,
Learning estimation)
Tasks Prediction
Clustering/Association (Data
Mining)
Learning Tasks (Contd.)
How Classification is performed?
a1x+b1y+c1 a3x+b3y+c3
Y Y
Ba
Ba
ck
Signal
ck
Signal
gr
gr
ou
ou
nd
nd
a2x+b2y+c2
X X
Event sample characterized by two variables X and Y (left figure)
A linear combination of cuts can separate “signal” from “background” (right
fig.) “Signal (x, y)” OUT
Define “step function” “Signal (x, y)” IN
Separate “signal” from “background” with the following function:

What is Regression?
A technique for determining the

statistical relationship between two
or more variables where a change in
a dependent variable is associated
with, and depends on, a change in
one or more independent variables.
What is Clustering
❖ Clustering is an unsupervised machine learning method that

attempts to uncover the natural groupings and statistical
distributions of data.
❖ There are multiple clustering methods such as K-means or
Hierarchical Clustering.
❖ Often, a measure of distance from point to point is used to
find which category a point should belong to as with
K-means.
Clustering
Applications of clustering
Market segmentation Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin,

Madison)
Organize computing clusters Astronomical data analysis

Learning Types
Supervised (Labeled data is available)
Unsupervised (Labeled data is not available)

Semi-Supervised (Labeled data is partially
available)
Reinforcement
Supervised Learning
In supervised learning, we are given a data set and already know what
our correct output should look like, having the idea that there is a
relationship between the input and the output.
Supervised learning problems are categorized into "regression" and

"classification" problems.
In a regression problem, we are trying to predict results within a

continuous output, meaning that we are trying to map input variables
to some continuous function.
In a classification problem, we are instead trying to predict results in a

discrete output.
In other words, we are trying to map input variables into discrete
categories
Example 1:
Given data about the size of houses on the real estate market, try to predict their
price. Price as a function of size is a continuous output, so this is a regression
problem.
Example 2:
(a) Regression - Given a picture of Male/Female, We have to predict his/her age on
the basis of given picture.
(b) Classification - Given a picture of Male/Female, We have to predict Whether

He/She is of High school, College, Graduate age. Another Example for Classification
- Banks have to decide whether or not to give a loan to someone on the basis of his
credit history.
Supervised Learning (Contd.)
Unsupervised Learning
Unsupervised learning, on the other hand, allows us to approach problems with

little or no idea what our results should look like. We can derive structure from
data where we don't necessarily know the effect of the variables.
We can derive this structure by clustering the data based on relationships among
the variables in the data.
With unsupervised learning there is no feedback based on the prediction results,
i.e., there is no teacher to correct you.

Clustering: Take a collection of 1000 essays written on the US Economy, and find a way
to automatically group these essays into a small number that are somehow similar or
related by different variables, such as word frequency, sentence length, page count, and
so on.
Non-clustering: The "Cocktail Party Algorithm", which can find structure in

messy data (such as the identification of individual voices and music from a
mesh of sounds at a cocktail party
Unsupervised Learning (Contd.)
Reinforcement Learning
Reinforcement learning is a special case of supervised learning,

where the exact desired output is unknown. The teacher supplies only
feedback about success or failure of an answer.
This is cognitively more plausible than supervised learning since a

fully specified correct answer might not always be available to the
learner or even the teacher.
Reinforcement Learning (Contd.)
•Your cat is an agent that is exposed to the environment. In
this case, it is your house. An example of a state could be
your cat sitting, and you use a specific word in for cat to walk.
•Our agent reacts by performing an action transition from one

"state" to another "state.“
•For example, your cat goes from sitting to walking.
•The reaction of an agent is an action, and the policy is a

method of selecting an action given a state in expectation of
better outcomes.
•After the transition, they may get a reward or penalty in

return.

DS - NLP

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DS - NLP

Uploaded by

Copyright:

Available Formats

DATA

The Digital Universe of Opportunities: Rich Data and

Data mining focuses using machine

It is the science and engineering of making intelligent machines, especially

“To gain knowledge or understanding of, or skill in by

``Learning denotes changes in the system that are adaptive in

❖ Human expertise does not exist

Why ❖ Humans are unable to explain their

❖ Solution needs to be adapted to

To build an intelligent system we build

Models are trained to solve the problems

Separate “signal” from “background” with the following function:

A technique for determining the

❖ Clustering is an unsupervised machine learning method that

Market segmentation Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin,

Organize computing clusters Astronomical data analysis

Supervised (Labeled data is available)

Unsupervised (Labeled data is not available)

Supervised learning problems are categorized into "regression" and

In a regression problem, we are trying to predict results within a

In a classification problem, we are instead trying to predict results in a

(b) Classification - Given a picture of Male/Female, We have to predict Whether

Unsupervised learning, on the other hand, allows us to approach problems with

With unsupervised learning there is no feedback based on the prediction results,

i.e., there is no teacher to correct you.

Non-clustering: The "Cocktail Party Algorithm", which can find structure in

Reinforcement learning is a special case of supervised learning,

This is cognitively more plausible than supervised learning since a

•Our agent reacts by performing an action transition from one

•For example, your cat goes from sitting to walking.

•The reaction of an agent is an action, and the policy is a

•After the transition, they may get a reward or penalty in

You might also like