You are on page 1of 17

Data science

By
Neha tyagi
What is Data Science

 Data Science is the area of study which involves extracting insights


from vast amounts of data by the use of various scientific methods,
algorithms, and processes. It helps you to discover hidden patterns
from the raw data. The term Data Science has emerged because of
the evolution of mathematical statistics, data analysis, and big data.

 Data Science is an interdisciplinary field that allows you to extract


knowledge from structured or unstructured data. Data science
enables you to translate a business problem into a research project
and then translate it back into a practical solution.

1.2
Why data science?

• Data is the oil for today’s world. With the right tools, technologies,
algorithms, we can use data and convert it into a distinctive
business advantage.
• Data Science can help you to detect fraud using advanced machine
learning algorithms
• It helps you to prevent any significant monetary losses
• Allows to build intelligence ability in machines.
• You can perform sentiment analysis to gauge customer brand
loyalty
• It enables you to take better and faster decisions
• Helps you to recommend the right product to the right customer to
enhance your business

1.3
1.4
Statistics:

 Statistics is the most critical unit of Data Science basics. It is the


method or science of collecting and analyzing numerical data in
large quantities to get useful insights.

Visualization:

 Visualization technique helps you to access huge amount of


data in easy to understand and digestible visuals.
 matplotlib A wide variety of tools exists for visualizing data. We
will be using the matplotlib library, which is widely used
(although sort of showing its age). If you are interested in
producing elaborate interactive visualizations for the Web, it is
likely not the right choice, but for simple bar charts, line charts,
and scatterplots, it works pretty well

1.5
Machine learning

Machine Learning explores the building and study of algorithms which


learn to make predictions about unforeseen/future data.

What is the Difference Between Data Science and Machine


Learning?

Data Science is a combination of algorithms, tools, and machine


learning technique which helps you to find common hidden patterns
from the given raw data. Whereas Machine learning is a branch of
computer science, that deals with system programming to
automatically learn and improve with experience.

ICS 243E - Ch. 1 Introduction Spring 2003 1.6


Data science Machine Learning

Data science is an interdisciplinary field that uses


Machine learning is the scientific study of
scientific methods, algorithms, and systems to
algorithms and statistical models. This method uses
extract knowledge from many structural and
to perform a specific task.
unstructured data.

Machine learning method helps you to predict and


Data science technique helps you to create insights
the outcome for new databases from historical data
from data dealing with all real-world complexities.
with the help of mathematical models.
Nearly all of the input data is generated in a human-
Input data for Machine learning will be transformed,
readable format, which is read or analyzed by
especially for algorithms used.
humans.

Data science can work with manual methods as Machine learning algorithms hard to implement
well, though they are not very useful. manually.

Machine learning is a single step in the entire data


Data science is a complete process.
science process.
Data science is not a subset of Artificial Intelligence Machine learning technology is a subset of Artificial
(AI). Intelligence (AI).

In Data Science, high RAM and SSD used, which In Machine Learning, GPUs are used for intensive
helps you to overcome I/O bottleneck problems. vector operations.

1.7
Data scientist job role:

 Most prominent Data Scientist job titles are:


• Data Scientist
• Data Engineer
• Data Analyst
• Statistician
• Data Architect
• Data Admin
• Business Analyst
• Data/Analytics Manager

1.8
 Data Scientist:
 Role: A Data Scientist is a professional who manages enormous
amounts of data to come up with compelling business visions by
using various tools, techniques, methodologies, algorithms, etc.
 Languages: R, SAS, Python, SQL, Hive, Matlab, Pig, Spark
 Data Engineer:
 Role: The role of data engineer is of working with large amounts of
data. He develops, constructs, tests, and maintains architectures
like large scale processing system and databases.
 Languages: SQL, Hive, R, SAS, Matlab, Python, Java, Ruby, C + +,
and Perl

1.9
 Data Analyst:
 Role: A data analyst is responsible for mining vast amounts of data.
He or she will look for relationships, patterns, trends in data. Later
he or she will deliver compelling reporting and visualization for
analyzing the data to take the most viable business decisions.
 Languages: R, Python, HTML, JS, C, C+ + , SQL
 Statistician:
 Role: The statistician collects, analyses, understand qualitative and
quantitative data by using statistical theories and methods.
 Languages: SQL, R, Matplotlib, Python, Perl, Spark, and Hive
 Data Administrator:
 Role: Data admin should ensure that the database is accessible to
all relevant users. He also makes sure that it is performing correctly
and is being kept safe from hacking.

1.10
 Challenges of Data science Technology
• High variety of information & data is required for accurate analysis
• Not adequate data science talent pool available
• Management does not provide financial support for a data science team
• Unavailability of/difficult access to data
• Data Science results not effectively used by business decision makers
• Explaining data science to others is difficult
• Privacy issues
• Lack of significant domain expert
• If an organization is very small, they can’t have a Data Science team

1.11
Traits of big data

 Big Data is a collection of data that is huge in volume, yet growing


exponentially with time. It is a data with so large size and
complexity that none of traditional data management tools can
store it or process it efficiently. Big data is also a data but with huge
size.
 Social Media
 The statistic shows that 500+terabytes of new data get ingested
into the databases of social media site Facebook, every day. This
data is mainly generated in terms of photo and video uploads,
message exchanges, putting comments etc.

1.12
 Types Of Big Data
 Following are the types of Big Data:
1. Structured
2. Unstructured
3. Semi-structured
 Structured
 Any data that can be stored, accessed and processed in the form of
fixed format is termed as a ‘structured’ data. Over the period of
time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where
the format is well known in advance) and also deriving value out of
it. However, nowadays, we are foreseeing issues when a size of such
data grows to a huge extent, typical sizes are being in the rage of
multiple zettabytes.

1.13
Structured data

Employee Employee Departme Salary_In_


Gender
_ID _Name nt lacs
Rajesh
2365  Male  Finance 650000
Kulkarni 
Pratibha
3398  Female  Admin  650000
Joshi 
Shushil
7465  Male  Admin  500000
Roy 
Shubhojit
7500  Male  Finance  500000
Das 
Priya
7699  Female  Finance  550000
Sane 

1.14
 Unstructured
 Any data with unknown form or the structure is classified as
unstructured data. In addition to the size being huge, un-structured
data poses multiple challenges in terms of its processing for
deriving value out of it. A typical example of unstructured data is a
heterogeneous data source containing a combination of simple text
files, images, videos etc. Now day organizations have wealth of data
available with them but unfortunately, they don’t know how to
derive value out of it since this data is in its raw form or
unstructured format.

Buil C
150
t
B

1.15
 Semi-structured
 Semi-structured data can contain both the forms of data. We can
see semi-structured data as a structured in form but it is actually not
defined with e.g. a table definition in relational DBMS. Example of
semi-structured data is a data represented in an XML file.

ICS 243E - Ch. 1 Introduction Spring 2003 1.16


Thank you

1.17

You might also like