You are on page 1of 38

20CS254

FOUNDATIONS OF
DATA SCIENCE
Course Outcomes

On successful completion of the course, the students will be able to

CO1: Recall the basic concepts of big data and data science. PO1, PO2, PO12

CO2: Utilize statistical concepts of big data collection, data analysis, PO1, PO2, PO4, PO5,
PO12
modelling, and inference.

CO3: Identify appropriate data mining algorithms to solve real world PO1, PO2, PO3, PO4,
PO5, PO12
problems.

CO4: PO1, PO2, PO3, PO4,


Analyze data, relevant models and tools for respective applications. PO5, PO12
SYLLABUS DISCUSSION
MODULE I
OVERVIEW OF BIG DATA AND DATA
SCIENCE
Data

 Data is a collection of facts, such as numbers, words, measurements, observations or just


descriptions of things.
Information

 Information is processed, organized and structured data. It provides context for data
and enables decision making process.
Data Vs Information
Data Vs Information Examples Chart

Data Information

each individual homework and test grade the student’s average grade for each class
of a student in one class
typing the words “cat videos” in your the list of search results that includes a
computer search engine (input) variety of cat videos on the internet
(output)
Knowledge

Information can be converted into knowledge about historical patterns and future
trends

Knowledge and information both basically have data as an essential component.


◦ Both knowledge and information can be identified by observation, stored,
retrieved and processed further.
◦ Information can be processed into knowledge; knowledge can be communicated
as information.
Database

A database is a collection of information that is organized so that it can be easily accessed,


managed and updated.
Big data
Introduction to Big Data

 Data which are very large in size is called Big Data.

 Normally we work on data of size MB(WordDoc ,Excel) or maximum GB(Movies, Codes)


but data in Peta bytes i.e. 10^15 byte size is called Big Data. It is stated that almost 90% of
today's data has been generated in the past 3 years.
Units of data

 The bit
 The Byte
 Kilobyte (1024 Bytes)
 Megabyte (1024 Kilobytes)
 Gigabyte (1,024 Megabytes, or 1,048,576 Kilobytes)
 Terabyte (1,024 Gigabytes)
 Petabyte (1,024 Terabytes, or 1,048,576 Gigabytes)
 Exabyte (1,024 Petabytes)
 Zettabyte (1,024 Exabytes)

Yottabyte (1,204 Zettabytes, or 1,208,925,819,614,629,174,706,176 bytes )
Sources of Big Data

 Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of data on a day
to day basis as they have billions of users worldwide.

 E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which users
buying trends can be traced.

 Weather Station: All the weather station and satellite gives very huge data which are stored and
manipulated to forecast weather.

 Telecom company: Telecom giants like Airtel, Vodafone study the user trends and accordingly publish their
plans and for this they store the data of its million users.

 Share Market: Stock exchange across the world generates huge amount of data through its daily
transaction.
Types Of Big Data

 Structured
 Unstructured
 Semi-structured
Contd…
Characteristics of Big Data
Applications of Big data
Big data

 https://www.youtube.com/watch?v=TzxmjbL-i4Y
Definition – Data Warehouse

 A Data Warehousing (DW) is process for collecting and managing data from varied
sources to provide meaningful business insights.
Data Warehouse
Definition-Data Mining

 In simple words, data mining is defined as a process used to extract usable data from a larger
set of any raw data.
Data Mining
Contd…

 Data mining (knowledge discovery from data)


 Extraction of interesting (non-trivial, implicit, previously unknown and potentially
useful) patterns or knowledge from huge amount of data
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging, information
harvesting, business intelligence, etc.
KDD Process
Data Analysis
Data Analytics

 Data analytics (DA) is the process of examining data sets in order


to find trends and draw conclusions about the information they
contain.
Data Science
Science
Life Cycle of Data Science
Data Acquisition
Data Pre-processing
Model Building & Pattern Evaluation
Knowledge Representation
Importance of Data Science

 https://www.youtube.com/watch?v=CCnCABJhAdU
 https://www.youtube.com/watch?v=lSwIe0TMUhc

You might also like