You are on page 1of 34

Introduction to Data Science

Dr. V. Masilamani
Why to Learn Data Science
Some views about Data Science
• In 1985, Chien-Fu Jeff Wu told Statistics should
be renamed as Data Science
• (Peter Naur(1974), Turing award winner, (ref:
ACM Computing Survey(2017)) ): Alternative
name for Computer Science is Data Science
What is Data Science
• Data science is an interdisciplinary field that
uses scientific methods, processes, algorithms
and systems to extract knowledge and insights
from structured and unstructured data, and
apply knowledge and actionable insights
obtained from data across a broad range of
application domains
– The communication of ACM
Data Science -Defn
• Discovering facts of a system from the data
generated by the system
• Science means discovering fact
• Data Science refers to discovering fact from
data
• Example: Given the marks of a class of students,
check if the class is good
– Suppose avg score is found to be more than 90%
from the data, the class is good
Data Science Vs Science
• Data Science: Discovering
information(fact/knowledge) from given data
• Paradigms of Science
– Empirical
– Theoretical
– Computational
– Data-Driven (new paradigm, Jim Gray(Turing Award
winner))
Data science Vs Computer Science
DA Vs DE Vs DS
Examples of Data Science Problems
• Problem statement: given a data set of
customers of airtel in India, find out the reason for
leaving the service.
• To discover fact, answer the question,
– Form
• Hypothesis 1: Poor connectivity is the reason for leaving
• Hypothesis 2: High tariff is the reason for leaving
• Hypothesis 3: Poor Customer service is the reason for leaving
– Do the hypothesis testing or build a ML model(eg
clustering) to accept or reject the hypothesis
Examples of Data Science Problems
• Given the sales data of a product over a few years, find out
reason for sales decline
– several hypotheses may be framed and tested
• Given the data of credit card transactions, find out the
characterization of fraudulent transaction
• Given the data of glacier melting, find out the reason for
melting
• Given trading data, find out the reason for very high
increase of a particular stock
• Given Drug-ADR data, find out the causes for ADR (eg. Find
out a component in the drug causing the ADR)
Purpose of Data Science
Applications of Data Science
Applications of Data Science
Application of Data Science
Application of Data Science in Business
Application of Data Science in Healthcare
Adv vs Disadv of Data Science
Data Science Life Cycle
Data Analytics
Data Science Vs Data Analytics
Process in Data Science
Data Science Hierarchy
Data Science Vs AI
Data Science Vs Machine Learning
• DS is partly automated, whereas ML is fully
automated
• In the seven stage life cycle of DS, only Modelling
(Learning the parameters of the model ), and
evaluating the model is automatic
• All other five stages involve human(expert)
intervention
– Eg. In data exploration, through scientific process such as
visualization of some properties(mean, variance
moment), pattern can be discovered by human
DS –A case study
• Problem statement: Given the data: students name,
roll_no, grade, feedback, find the process to improve
the students’ satisfaction(feedback)
• Stage 1: Understand what is required: Define a
procedure to improve the feedback of students
• Stage 2: Collect the data as mentioned in the problem
statement
• Stage 3: Data Cleaning
– a) Remove/correct noisy data
– b) Store the data in a format that helps the faster
processing
DS –A case study
• Stage 4: Data Exploration
– Print the frequency of students with respect to feedback
range or departments or JEE main rank rage or previous
knowledge on the subject etc.
– Find out the factor determining low feed back
– Evolve a method to increase feed back
• Supposing ECE student give poor feed back, CSE students give very
good feed back, Split the class and do customized teaching
• Supposing students with prior knowledge gives good feedback the
other do not, split the class teach more basics slowly to group
with out prior knowledge and advance topics for the other group
DS –A case study-1
• Stage 5: Modelling
– To justify the plan(Model) works in a scientific process
• Do hypothesis testing(statistical ) or
• Learn a predictive model(ML) and test it

• Stage 6: Evaluate the Model on new data and visualise /


communicate the results
• Stage 7: Deploy the model
– Teach one group with more basics the other with advance topics
– Some times, the model needs to be run in cloud with the required
data in the cloud DB or communicated from client to cloud on the fly.
The results may need to be stored in the cloud DB or/and transmitted
to the client machine and the client will visualize the results
DS –A case study-2
• A leading travel agency increased revenue by
16% and market share by 21%
– Computed customer segments
– The performance of offline sales agents was
analyzed
– Ideal hours for sales agents were noticed
DS –A case study-3
• DS is the success behind Netflix
– Understand the customer behavior
– Builds and runs the recommender engine
• Which movie to be watched next based on the
sequence of movies watched by the customer earlier
• Recommendation based on the review of the movie
• Engage the audience by predicting their interests
Python for Data Science -Why
Python for Data Science -Why
Python for Data Science -Why
Python for Data Science -Why
Python for Data Science -Why

You might also like