0% found this document useful (0 votes)

38 views6 pages

1.1 Data Science Introduction - Definition, Components, and Applications

Data science is an interdisciplinary field that combines computer science, statistics, and domain expertise to derive actionable insights from data. It has gained prominence due to the processes of datification and democratization of data analysis, allowing broader access to analytical tools and techniques. Key components of data science include statistics, data engineering, visualization, advanced computing, mathematics, and machine learning, with various applications across industries such as healthcare, fraud detection, and image recognition.

Uploaded by

C V Madhava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views6 pages

1.1 Data Science Introduction - Definition, Components, and Applications

Uploaded by

C V Madhava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1.

1 Data science Introduction:

Definition, Components, and
applications
What is data science?¶
The term “data science” was coined in 2001, attempting to describe a new field. Some argue that it’s
nothing more than the natural evolution of statistics, and shouldn’t be called a new field at all. But
others argue that it’s more interdisciplinary. For example, in The Data Science Design Manual (2017),
Steven Skiena says the following.

Data science is considered as lying at the intersection of computer science, statistics, and
substantive application domains. From computer science comes machine learning and high-
performance computing technologies for dealing with scale. From statistics comes a long tradition
of exploratory data analysis, significance testing, and visualization. From application domains in
business and the sciences come challenges worthy of battle, and evaluation standards to assess
when they have been adequately conquered.

This echoes a famous blog post by Drew Conway in 2013, called The Data Science Venn Diagram, in
which he drew the following diagram to indicate the various fields that come together to form what
we call “data science.”

Most of the scientific methods that power data science are not new and they have been out there,
waiting for applications to be developed, for a long time. Statistics is an old science that stands on
the shoulders of eighteenth century giants such as Pierre Simon Laplace (1749–1827) and Thomas
Bayes (1701–1761). Machine learning is younger, but it has already moved beyond its infancy and
can be considered a well- established discipline. Computer science changed our lives several decades
ago and continues to do so; but it cannot be considered new. So, why is data science seen as a novel
trend within business reviews, in technology blogs, and at academic conferences? The novelty of
data science is not rooted in the latest scientific knowledge, but in a disruptive change in our society
that has been caused by the evolution of technology: datification.

Datification is the process of rendering into data aspects of the world that have never been
quantified before. At the personal level, the list of datified concepts is very long and still growing:
business networks, the lists of books we are reading, the films we enjoy, the food we eat, our
physical activity, our purchases, our driving behavior, and so on. Even our thoughts are datified
when we publish them on our favorite social network; and in a not so distant future, your gaze could
be datified by wearable vision registering devices. At the business level, companies are datifying
semi-structured data that were previously discarded: web activity logs, computer network activity,
machinery signals, etc. Non-structured data,such as written reports, e-mails, or voice recordings, are
now being stored not only for archive purposes but also to be analyzed. However, datification is not
the only ingredient of the data science revolution. The other ingredient is the democratization of
data analysis. Large companies such as Google, Yahoo, IBM, or SAS were the only players in this field
when data science had no name. At the beginning of the century, the huge computational
resources of those companies allowed them to take advantage of datification by using analytical
techniques to develop innovative products and even to take decisions about their own business.
Today, the analytical gap between those companies and the rest of the world (companies and
people) is shrinking. Access to cloud computing allows any individual to analyze huge amounts of
data in short periods of time. Analytical knowledge is free and most of the crucial algorithms that
are needed to create a solution can be found, because open-source development is the norm in this
field. Asa result, the possibility of using rich data to take evidence based decisions is open to virtually
any person or company.

Data science is commonly defined as a methodology by which actionable insights can be inferred
from data. This is a subtle but important difference with respect to previous approaches to data
analysis, such as business intelligence or exploratory statistics. Performing data science is a task
with an ambitious objective: the production of beliefs informed by data and to be used as the
basis of decision-making. In the absence of data, beliefs are uninformed and decisions, in the best
of cases, are based on best practices or intuition. The representation of complex environments by
rich data opens up the possibility of applying all the scientific knowledge we have regarding how to
infer knowledge from data.

In general, data science allows us to adopt four different strategies to explore the world using data:

1. Probing reality. Data can be gathered by passive or by active methods. In the latter case, data
represents the response of the world to our actions. Analysis of those responses can be extremely
valuable when it comes to taking decisions about our subsequent actions. One of the best examples
of this strategy is the use of A/B testing for web development:

What is the best button size and color? The best answer can only be found by probing the world.

2. Pattern discovery. Divide and conquer is an old heuristic used to solve complex problems; but it is
not always easy to decide how to apply this common sense to problems. Datified problems can be
analyzed automatically to discover useful patterns and natural clusters that can greatly simplifatic
advertising or digital marketing.

3. Predicting future events. Since the early days of statistics, one of the most important scientific
questions has been how to build robust data models that are capable of predicting future data
samples. Predictive analytics allows decisions to be taken in response to future events, not only
reactively. Of course, it is not possible to predict the future in any environment and there
will always be unpredictable events; but the identification of predictable events represents
valuable knowledge. For example, predictive analytics can be used to optimize the tasks planned for
retail store staff during the following week, by analyzing data such as weather, historic sales, traffic
conditions, etc.
4. Understanding people and the world. This is an objective that at the moment is beyond the scope
of most companies and people, but large companies and governments are investing considerable
amounts of money in research areas such as understanding natural language, computer vision,
psychology and neuroscience. Scientific understanding of these areas is important for data science
because in the end, in order to take optimal decisions, it is necessary to know the real processes
that drive people’s decisions and behavior. The development of deep learning methods for natural
language understanding and for visual object recognition is a good example of this kind of research.

Applications of Data Science

• Fraud and risk detection

• Healthcare

• Airline Route Planning

• Image and speech recognition

• Augmented reality

Facets of data
In data science and big data you’ll come across many different types of data, and each

of them tends to require different tools and techniques. The main categories of data

are these:

■ Structured

■ Unstructured

■ Natural language

■ Machine-generated

■ Graph-based

■ Audio, video, and images

■ Streaming

Structured data
Structured data is data that depends on a data model and resides in a fixed field

within a record. As such, it’s often easy to store structured data in tables within databases

or Excel files
Unstructured data
Unstructured data is data that isn’t easy to fit into a data model because the content is

context-specific or varying. One example of unstructured data is your regular email

. Although email contains structured elements such as the sender, title,

and body text, it’s a challenge to find the number of people who have written an

email complaint about a specific employee because so many ways exist to refer to a

person, for example.

Natural language
Natural language is a special type of unstructured data; it’s challenging to process

because it requires knowledge of specific data science techniques and linguistics.

The natural language processing community has had success in entity recognition,

topic recognition, summarization, text completion, and sentiment analysis, but models

trained in one domain don’t generalize well to other domains. Even state-of-the-art

techniques aren’t able to decipher the meaning of every piece of text. This shouldn’t

be a surprise though: humans struggle with natural language as well. It’s ambiguous

by nature. The concept of meaning itself is questionable here. Have two people listen

to the same conversation. Will they get the same meaning? The meaning of the same

words can vary when coming from someone upset or joyous.

Machine-generated data

Machine-generated data is information that’s automatically created by a computer,

process, application, or other machine without human intervention.The analysis of machine data
relies on highly scalable tools, due to its high volume

and speed. Examples of machine data are web server logs, call detail records, network

event logs, and telemetry

Graph-based or network data

“Graph data” can be a confusing term because any data can be shown in a graph.

“Graph” in this case points to mathematical graph theory. In graph theory, a graph is a

mathematical structure to model pair-wise relationships between objects. Graph or

network data is, in short, data that focuses on the relationship or adjacency of objects.
The graph structures use nodes, edges, and properties to represent and store graphical

data. Graph-based data is a natural way to represent social networks, and its structure

allows you to calculate specific metrics such as the influence of a person and the

shortest path between two people.

Examples of graph-based data can be found on many social media websites

Streaming data
While streaming data can take almost any of the previous forms, it has an extra

property. The data flows into the system when an event happens instead of being

loaded into a data store in a batch. Although this isn’t really a different type of data,

we treat it here as such because you need to adapt your process to deal with this type

of information.

COMPONENTS OF DATA SCIENCE

The main components of Data Science are given below:

1. Statistics: Statistics is one of the most important components of data science. Statistics is a way to
collect and analyze the numerical data in a large amount and finding meaningful insights from it.

2. Domain Expertise: In data science, domain expertise binds data science together. Domain
expertise means specialized knowledge or skills of a particular area. In data science, there are
various areas for which we need domain experts.

3. Data engineering: Data engineering is a part of data science, which involves acquiring, storing,
retrieving, and transforming the data. Data engineering also includes metadata (data about data) to
the data. Data engineering also includes metadata (data about data) to the data.

4. Visualization: Data visualization is meant by representing data in a visual context so that people
can easily understand the significance of data. Data visualization makes it easy to access the huge
amount of data in visuals.

5. Advanced computing: Heavy lifting of data science is advanced computing. Advanced computing
involves designing, writing, debugging, and maintaining the source code of computer programs.

6. Mathematics: Mathematics is the critical part of data science. Mathematics involves the study of
quantity, structure, space, and changes. For a data scientist, knowledge of good mathematics is
essential.

7. Machine learning: Machine learning is backbone of data science. Machine learning is all about to
provide training to a machine so that it can act as a human brain. In data science, we use various
machine learning algorithms to solve the problems.
Tools for Data Science
Following are some tools required for data science:

Data Analysis tools: R, Python, Statistics, SAS, Jupyter, R Studio, MATLAB, Excel, RapidMiner.

Data Warehousing: ETL, SQL, Hadoop, Informatica/Talend, AWS Redshift

Data Visualization tools: R, Jupyter, Tableau, Cognos.

Machine learning tools: Spark, Mahout, Azure ML studio.

Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
57 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
161 pages
Unit 01 Ids
No ratings yet
Unit 01 Ids
39 pages
Intro to Data Science Basics
No ratings yet
Intro to Data Science Basics
18 pages
Data Science
No ratings yet
Data Science
66 pages
Unit 1-3
No ratings yet
Unit 1-3
39 pages
Data Science 2020
100% (1)
Data Science 2020
123 pages
Module 1
No ratings yet
Module 1
60 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
36 pages
Unit 1
No ratings yet
Unit 1
28 pages
Fundamentals of Data Science Course
75% (4)
Fundamentals of Data Science Course
62 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Data Science Course Overview & Outcomes
No ratings yet
Data Science Course Overview & Outcomes
56 pages
Introduction to Data Science Overview
No ratings yet
Introduction to Data Science Overview
17 pages
Lecture 1 and 2 Powerpoints
No ratings yet
Lecture 1 and 2 Powerpoints
32 pages
Data Science
No ratings yet
Data Science
87 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
33 pages
Data Science Overview and Applications
No ratings yet
Data Science Overview and Applications
40 pages
Data Science's Role in Business Success
No ratings yet
Data Science's Role in Business Success
37 pages
DS Unit-1 PDF
No ratings yet
DS Unit-1 PDF
50 pages
Data Science
No ratings yet
Data Science
6 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
232 pages
Unit 1 PPT 1
100% (1)
Unit 1 PPT 1
27 pages
Introduction to Data Science Tools
No ratings yet
Introduction to Data Science Tools
20 pages
Introduction to Data Science Overview
No ratings yet
Introduction to Data Science Overview
98 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
31 pages
Data Science Overview & Applications
No ratings yet
Data Science Overview & Applications
10 pages
PSD02 - Data Science Overview
No ratings yet
PSD02 - Data Science Overview
64 pages
Data Science - AD1102-1
No ratings yet
Data Science - AD1102-1
53 pages
Data Science
No ratings yet
Data Science
76 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
96 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Data Science
No ratings yet
Data Science
16 pages
DataScience PDF
No ratings yet
DataScience PDF
277 pages
Data Science Basics
No ratings yet
Data Science Basics
25 pages
Fdsunit 1
No ratings yet
Fdsunit 1
27 pages
Unit I
No ratings yet
Unit I
29 pages
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
100% (1)
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
122 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
17 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
161 pages
Ids Mod1
No ratings yet
Ids Mod1
21 pages
Data Science Unit 1 Notes
No ratings yet
Data Science Unit 1 Notes
22 pages
Introduction To Data Science Practical Approach With R and Python (B. Uma Maheswari, R. Sujatha) (Z-Library) - 8-28
No ratings yet
Introduction To Data Science Practical Approach With R and Python (B. Uma Maheswari, R. Sujatha) (Z-Library) - 8-28
21 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
33 pages
UNIT I Democracy
No ratings yet
UNIT I Democracy
75 pages
Data Science Unit 1 NOTES
No ratings yet
Data Science Unit 1 NOTES
45 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
Chapter 1 - DS
No ratings yet
Chapter 1 - DS
13 pages
21css303t Datascience Unit 1 Notes
No ratings yet
21css303t Datascience Unit 1 Notes
246 pages
Data Science vs. Statistics: Perspectives
No ratings yet
Data Science vs. Statistics: Perspectives
22 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
34 pages
IDS Unit1 Notes
No ratings yet
IDS Unit1 Notes
38 pages
Data Science - FYBCA-Sem-II
No ratings yet
Data Science - FYBCA-Sem-II
13 pages
22CDE01-Data Science Unit 1
No ratings yet
22CDE01-Data Science Unit 1
82 pages
Slowing Light and Warp Drive Theory
No ratings yet
Slowing Light and Warp Drive Theory
2 pages
Handout 4 - JHA For Replacing Light Bulb
100% (1)
Handout 4 - JHA For Replacing Light Bulb
2 pages
Elio Science
No ratings yet
Elio Science
3 pages
Prosperity, The Choice Is Yours (Kenneth Copeland) (Z-Library)
No ratings yet
Prosperity, The Choice Is Yours (Kenneth Copeland) (Z-Library)
30 pages
New Holland T5070 Tractor Service Repair Manual Instant Download
No ratings yet
New Holland T5070 Tractor Service Repair Manual Instant Download
20 pages
Cloet 2017
No ratings yet
Cloet 2017
9 pages
Precast Concrete Plant Site Selection Guide
No ratings yet
Precast Concrete Plant Site Selection Guide
31 pages
ETL Testing: Key Test Cases Guide
No ratings yet
ETL Testing: Key Test Cases Guide
2 pages
Sentence Moods - Definitions and Uses
No ratings yet
Sentence Moods - Definitions and Uses
3 pages
MTS Learning The Alphabet Level1 Letter O Workbook
100% (1)
MTS Learning The Alphabet Level1 Letter O Workbook
71 pages
Assignment 2 On Network Security
No ratings yet
Assignment 2 On Network Security
2 pages
Advances in Tunnel Surveying Hong Kong
No ratings yet
Advances in Tunnel Surveying Hong Kong
10 pages
Chemistry Major Books
100% (1)
Chemistry Major Books
3 pages
Hawthrone Experiment
No ratings yet
Hawthrone Experiment
8 pages
Journey vs. Trip: Usage Guide
100% (1)
Journey vs. Trip: Usage Guide
2 pages
Concrete Mix Design Procedure 2009
No ratings yet
Concrete Mix Design Procedure 2009
16 pages
1 s2.0 S0360319923029853 Main
No ratings yet
1 s2.0 S0360319923029853 Main
20 pages
Resultants of Force Systems Explained
No ratings yet
Resultants of Force Systems Explained
17 pages
Unemployment and Mental Health - A Global Study of Unemployment's Influence On Diverse Mental Disorders - PMC
No ratings yet
Unemployment and Mental Health - A Global Study of Unemployment's Influence On Diverse Mental Disorders - PMC
45 pages
Professional Studies Catalog
No ratings yet
Professional Studies Catalog
10 pages
HQ Series Portable Meters User Manual
No ratings yet
HQ Series Portable Meters User Manual
36 pages
Icea Nema S-61-402 Parte 5
No ratings yet
Icea Nema S-61-402 Parte 5
1 page
Weekly HSE Report Summary
No ratings yet
Weekly HSE Report Summary
1 page
Microbes
No ratings yet
Microbes
13 pages
Clearswift Secure ICAP Gateway - Installation, Deployment & Implemetation - POA
No ratings yet
Clearswift Secure ICAP Gateway - Installation, Deployment & Implemetation - POA
24 pages
Antivirus Protection Methods Procedures and Tips
No ratings yet
Antivirus Protection Methods Procedures and Tips
18 pages
Strategic Management Exam Overview
No ratings yet
Strategic Management Exam Overview
2 pages
Biohydrogen Dan Biomethane
No ratings yet
Biohydrogen Dan Biomethane
29 pages
2004 Corolla Automatic Transmission Guide
No ratings yet
2004 Corolla Automatic Transmission Guide
10 pages
Applications of Kutta-Joukowski Transformation To The Flow Past A Flat Plate
No ratings yet
Applications of Kutta-Joukowski Transformation To The Flow Past A Flat Plate
11 pages

1.1 Data Science Introduction - Definition, Components, and Applications

Uploaded by

1.1 Data Science Introduction - Definition, Components, and Applications

Uploaded by

1.

1 Data science Introduction:

Applications of Data Science

• Airline Route Planning

• Image and speech recognition

■ Audio, video, and images

context-specific or varying. One example of unstructured data is your regular email

. Although email contains structured elements such as the sender, title,

person, for example.

because it requires knowledge of specific data science techniques and linguistics.

words can vary when coming from someone upset or joyous.

Machine-generated data is information that’s automatically created by a computer,

event logs, and telemetry

Graph-based or network data

mathematical structure to model pair-wise relationships between objects. Graph or

shortest path between two people.

Examples of graph-based data can be found on many social media websites

COMPONENTS OF DATA SCIENCE

The main components of Data Science are given below:

Data Warehousing: ETL, SQL, Hadoop, Informatica/Talend, AWS Redshift

Data Visualization tools: R, Jupyter, Tableau, Cognos.

Machine learning tools: Spark, Mahout, Azure ML studio.

You might also like