Professional Documents
Culture Documents
SCIENCE
NLP
By
ISRAR ALI
Basic Concepts
Terminologies, Motivation for the subject, Definitions,
Examples
❖ 1. Google: processes 24 peta bytes of data per day.
❖ 2. Facebook: 10 million photos uploaded every hour.
❖ 3. Youtube: 1 hour of video uploaded every second.
❖ 4. Twitter: 400 million tweets per day.
❖ 5. Astronomy: Satellite data is in hundreds of PB.
❖ 7. By 2020 the digital universe will reach 44
zettabytes..."
Texts
Numbers
Graphs
Data types
Tables
Images
Transactions
Videos
Smile, we are
'DATAFIED'!
Wherever we go, we are “datafied".
Smartphones are tracking our locations.
We leave a data trail in our web browsing.
Interaction in social networks.
Privacy is an important issue in Data Science.
Internet of Things (IoT)
Lots of data
Lots of computation
Various types of communication
(machine-to-machine, machine-to-human)
The Data Science process
What is Data Mining?
? What is Intelligence?
Intelligence is the computational part of the ability to achieve
goals in the world. Varying kinds and degrees of intelligence occur
in people and animals.
Graphical View
Machine
Learning
Neural Networks
Deep Learning/CNN
❖ Learning is used when:
Regression (function
approximation, curve fitting,
Learning estimation)
Tasks Prediction
Clustering/Association (Data
Mining)
Learning Tasks (Contd.)
How Classification is performed?
a1x+b1y+c1 a3x+b3y+c3
Y Y
Ba
Ba
ck
Signal
ck
Signal
gr
gr
ou
ou
nd
nd
a2x+b2y+c2
X X
Event sample characterized by two variables X and Y (left figure)
A linear combination of cuts can separate “signal” from “background” (right
fig.) “Signal (x, y)” OUT
Define “step function” “Signal (x, y)” IN
In supervised learning, we are given a data set and already know what
our correct output should look like, having the idea that there is a
relationship between the input and the output.
Example 2:
(a) Regression - Given a picture of Male/Female, We have to predict his/her age on
the basis of given picture.
We can derive this structure by clustering the data based on relationships among
the variables in the data.