You are on page 1of 6

Unit - 1 Introduction to Data Science and Big Data

Data science is used in to how the data can be made use in a manner
that helps in a better decision-making process and solve complex
problems more simply.

It processes a huge amount of structured, semi-structured,


unstructured data to extract insight meaning, from which one
pattern can be designed that will be useful to take a decision for
grabbing the new business opportunity, the betterment of
product/service, ultimately business growth.

AI
 Artificial intelligence is a technology using which we can create
intelligent systems that can simulate human intelligence

 Artificial intelligence system does not require to be pre-


programmed

Machine Learning

It is about extracting knowledge from the data


Machine learning is a subfield of artificial intelligence, which enables
machines to learn from past data or experiences without being
explicitly programmed
Big Data –
It is a collection of data sets which is so large and complex that it
become difficult to process using DBMs tools

"Big Data" consists of very large volumes of heterogeneous data that


is being generated, often, at high speeds.

Application of Data Science

Transport
Data Science also entered into the Transport field like Driverless Cars.
With the help of Driverless Cars, it is easy to reduce the number of
Accidents.
 In Driverless Cars the training data is fed into the algorithm and with
the help of Data Science techniques, the Data is analyzed 

E commerce -
E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to
make a better user experience with personalized recommendations.

Banking
It is one of the biggest applications of Data Science , banks can
manage their resources efficiently, furthermore, banks can make
smarter decisions through fraud detection, management of customer
data, risk modeling, real-time predictive analytics, customer
segmentation, etc.

Manufacturing
– Optimizing production
– Reducing costs
– Boosting the profits
Data Explosion

The rapid increase in the amount of data that is generated and stored
in the computing systems, that reaches level where data management
becomes difficult, is called “Data Explosion”.

The key drivers of data growth are following :


– Increase in storage capacities.
– Cheaper storage.
– Increase in data processing capabilities by modern computing
devices.
– Data generated and made available by different sectors

Five V’s of big data

we can identify Big Data by a few characteristics which are


specific to Big Data. Which is know as Five V’s of big data

• Volume –
it refers to the amount of data that exists. 
If the volume of data is large enough, it can be considered big data

• Velocity -
It refers to how quickly data is generated and how quickly that data
moves.
• Variety
refers to the diversity of data types.
An organization might obtain data from a number of different data
sources, which may vary in value. Data can come from sources in and
outside an enterprise as well

• Veracity
It refers to the quality and accuracy of data. Gathered data could have
missing pieces, may be inaccurate or may not be able to provide real,
valuable insight

• Value
 This refers to the value that big data can provide, and it relates directly
to what organizations can do with that collected data.

Relation between DS and IS


DS is about discovery of knowledge from a data
A data science is field in which information and knowledge extracted
from data by using diff algorithm and processes
Data science is used in business function such as strategy information ,
decision making

IS is about design practices for storing and retrieving information


IS is used in areas such as knowledge management , data
management
Data science Lifecycle –

Data Science Life Cycle is a definite procedure that has five


important steps .
Gathering/Collecting Data

Before creating any new product, organizations need to


collect data to research the demand, customer preferences,
competitors, etc.
If these data are not collected in advance, the rate of failure
for the new product is 80 percent or even higher.

There are two main methods of data collection

1. Primary Data Collection


• Interviews
• Observations
• Surveys and Questionnaires
• Focus Groups
• Oral Histories

2. Secondary Data Collection

Secondary data refers to data that has already been collected


by someone else.
• It is much more inexpensive and easier to collect than
primary data.
• While primary data collection provides more authentic and
original data, there are numerous instances where secondary
data collection provides great value to organizations.
Example - Internet
Cleaning Data -
Scrubbing and filtering of data.
Here we Remove duplicate or irrelevant observations
Exploring Data
Modeling Data
Interpreting Data

You might also like