You are on page 1of 8

Introduction to Data Science

Tanmay Basu

Department of Data Science and Engineering


IISER Bhopal, India

Tanmay Basu Introduction to Data Science 1


What is Data Science?
”The ability to take data - to be able to understand it, to
process it, to extract value from it, to visualize it, to
communicate it - that’s going to be a hugely important skill
in the next decades.”1
- Prof. Hal Varian, Chief Economist at Google and Emeritus Professor at
UC Berkeley School of Information, 2009.

▶ Data science is an interdisciplinary field focused on extracting


knowledge from large volume of data sets to solve problems in
a wide range of application domains.

▶ It might therefore imply a focus involving data and, by


extension, statistics, or the systematic study of the
organization, properties, and analysis of data and its role in
inference, including our confidence in the inference.
1
https://ischoolonline.berkeley.edu/data-science/what-is-data-science
Tanmay Basu Introduction to Data Science 2
Why Data Science?
• Why do we need a new term like data science when we have
had statistics and database systems for decades?

• Data is increasingly heterogeneous and mostly unstructured


e.g., text, video etc. and they are often emanating from
networks with complex relationships between their entities.

• Traditional database models somewhat inadequate for


knowledge discovery in data

• Statistics emphasizes quantitative data and description,


whereas, data science deals with quantitative and qualitative
data (e.g. images) and emphasizes prediction and action.

• Thus we need to develop learning methodologies to extract


knowledge and insights from data.
Tanmay Basu Introduction to Data Science 3
Data Science Life Cycle

Source: https://ischoolonline.berkeley.edu/data-science/what-is-data-science
Tanmay Basu Introduction to Data Science 4
Role of Data Scientist

▶ Data Scientists are able to identify relevant questions, collect data from
a multitude of different data sources, organize the information, translate
results into solutions, and communicate their findings in a way that
positively affects business decisions.

▶ Data Analysts bridge the gap between data scientists and business
analysts. They are provided with the questions that need answering from
an organization and then organize and analyze data to find results that
align with high-level business strategy.

▶ Data Engineers manage exponential amounts of rapidly changing data.


They focus on the development, deployment, management, and
optimization of data pipelines and infrastructure to transform and
transfer data to data scientists for querying.

Source: https://ischoolonline.berkeley.edu/data-science/what-is-data-science

Tanmay Basu Introduction to Data Science 5


What is Data?

• Data are individual facts, statistics, or items of information,


often numeric.

• In a more technical sense, data are a set of values of


qualitative or quantitative variables about one or more persons
or objects.

• Data are measured, collected, reported, and analyzed, and


used to create data visualizations such as graphs, tables or
images.

• In data science, data can be viewed as a combination of


distinct variables grouped together.

Tanmay Basu Introduction to Data Science 6


References

▶ Vasant Dhar. Data Science and Prediction. Communications


of the ACM, 56(12), pp. 64-73, 2013.
https://dl.acm.org/doi/pdf/10.1145/2500499

▶ Ilkka Tuomi. Data is More Than Knowledge. Journal of


Management Information Systems. 6(3): 103–117, 2000.

▶ OECD Glossary of Statistical Terms, OECD, pp. 119, 2008.


ISBN 978-92-64-025561.

Tanmay Basu Introduction to Data Science 7


Thank You

Tanmay Basu Introduction to Data Science 8

You might also like