You are on page 1of 13

What is Data Science?

By Medono ZhasaLast updated on Oct 1, 2019289

Data science or data-driven science enables better decision making, predictive analysis,
and pattern discovery. It lets you:

 Find the leading cause of a problem by asking the right questions

 Perform exploratory study on the data

 Model the data using various algorithms

 Communicate and visualize the results via graphs, dashboards, etc.

Are you considering a profession in the field of Data Science? Then get certified with the Data Science
Certification Training Course today!
In practice, data science is already helping the airline industry predict disruptions in
travel to alleviate the pain for both airlines and passengers. With the help of data
science, airlines can optimize operations in many ways, including:

 Plan routes and decide whether to schedule direct or connecting flights

 Build predictive analytics models to forecast flight delays

 Offer personalized promotional offers based on customers’ booking patterns

 Decide which class of planes to purchase for better overall performance

In another example, let’s say you want to buy new furniture for your office. When
looking online for the best option and deal, you should answer some critical questions
before making your decision.
Using this sample decision tree, you can narrow down your selection to a few websites
and, ultimately, make a more informed final decision.

Difference Between Business Intelligence and Data


Science

Business intelligence is a combination of the strategies and technologies used for the
analysis of business data/information. Like data science, it can provide historical,
current, and predictive views of business operations. However, there are some key
differences.

Business Intelligence Data Science

Uses structured data Uses both structured and unstructured data


Analytical in nature - provides a historical report Scientific in nature - perform an in-depth statistical
of the data analysis on the data

Use of basic statistics with emphasis on Leverages more sophisticated statistical and predictive
visualization (dashboards, reports) analysis and machine learning (ML)

Compares historical data to current data to Combines historical and current data to predict future
identify trends performance and outcomes

Prerequisites for Data Science

 Curiosity - The first thing you need to understand the business problem is to ask the
right questions. Asking the wrong ones is why many data science projects fail

 Common Sense - To identify new ways to prioritize and solve business problems,
you need common sense. Even if you have an incomplete dataset, you need to be
creative by filling in any gaps on your own

 Communication Skills - Even if your analysis is superb, you need to be able to


communicate your findings effectively; otherwise nobody else will know

Watch out this video to know about the exciting field of Data Science.

Machine Learning

Machine learning is the backbone of data science. Data Scientists need to have a solid
grasp on ML in addition to basic knowledge of statistics.

Modeling

Mathematical models enable you to make quick calculations and predictions based on
what you already know about the data. Modeling is also a part of ML and involves
identifying which algorithm is the most suitable to solve a given problem and how to
train these models.
Statistics

Statistics are at the core of data science. A sturdy handle on statistics can help you
extract more intelligence and obtain more meaningful results.

Programming

Some level of programming is required to execute a successful data science project.


The most common programming languages are Python, and R. Python is especially
popular because it’s easy to learn, and it supports multiple libraries for data science and
ML.

Databases

A capable data scientist, you need to understand how databases work, how to manage
them, and how to extract data from them.

Are you preparing for a career in Data Science? Take this Data Science Practice Test for free and assess your
knowledge.

Tools/Skills Used in Data Science

Field Skills Tools

Data Analysis R, Python, Statistics SAS, Jupyter, R Studio, MATLAB, Excel,


RapidMiner

Data ETL, SQL, Hadoop, Apache Spark, Informatica/ Talend, AWS Redshift
Warehousing
Data R, Python libraries Jupyter, Tableau, Cognos, RAW
Visualization

Machine Python, Algebra, ML Algorithms, Spark MLib, Mahout, Azure ML studio


Learning Statistics

What Does a Data Scientist Do?

A data scientist analyzes business data to extract meaningful insights. In other words, a
data scientist solves business problems through a series of steps, including:

 Ask the right questions to understand the problem

 Gather data from multiple sources—enterprise data, public data, etc

 Process raw data and convert it into a format suitable for analysis

 Feed the data into the analytic system—ML algorithm or a statistical model

 Prepare the results and insights to share with the appropriate stakeholders

Must-Know Machine Learning Algorithms

The most basic and essential ML algorithms a data scientist use include:

Regression

Regression is an ML algorithm based on supervised learning techniques. The output of


regression is a real or continuous value. For example, predicting the temperature of a
room.

Clustering
Clustering is an ML algorithm based on unsupervised learning techniques. It works on a
set on unlabeled data points and groups each data point into a cluster.

Decision Tree

A decision tree refers to a supervised learning method used primarily for classification.
The algorithm classifies the various inputs according to a specific parameter. The most
significant advantage of a decision tree is that it is easy to understand, and it clearly
shows the reason for its classification.

Support Vector Machines

Support vector machines (SVMs) is also a supervised learning method used primarily
for classification. SVMs can perform both linear and non-linear classifications.

Naive Bayes

Naive Bayes is a statistical probability-based classification method best used for binary
and multi-class classification problems.

The Lifecycle of a Data Science Project

Concept Study

The first phase of a data science project is the concept study. The goal of this step is to
understand the problem by performing a study of the business model.

For example, let’s say you are trying to predict the price of a 1.35-carat diamond. In this
case, you need to understand the terminology used in the industry and the business
problem, and then collect enough relevant data about the industry.

Data Preparation

Since raw data may not be usable, data preparation is the most crucial aspect of the
data science lifecycle. A data scientist must first examine the data to identify any gaps
or data that do not add any value. During this process, you must go through several
steps, including:
 Data integration - Resolve any conflicts in the dataset and eliminate redundancies

 Data transformation - Normalize, transform and aggregate data using ETL (extract,
transform, load) methods

 Data reduction - Using various strategies, reduce the size of data without impacting
the quality or outcome

 Data cleaning - Correct inconsistent data by filling out missing values and smoothing
out noisy data

Model Planning

After you have cleaned up the data, you must choose a suitable model. The model you
want must match the nature of the problem—is it a regression problem, or a
classification one? This step also involves an Exploratory Data Analysis (EDA) to
provide a more in-depth analysis of the data and understand the relationship between
the variables. Some techniques used for EDA are histograms, box plots, trend analysis,
etc.

Using these techniques, we can quickly discover that the relationship between a carat
and the price of a diamond is linear.

Then, split the information into training and testing data—training data to train the
model, and testing data to validate the model. If the testing is not accurate, you will
need to retrain the model the process or use another model. If it is valid, you can put it
into production.

The various tools used for model planning are:

 R - R can be used both for regular statistical analysis or mission learning analysis,
including visualization for more detailed analysis
 Python - Python offers a rich library for performing data analysis and machine
learning

 Matlab - Matlab is a popular tool and one of the easiest to learn

 SAS - SAS is a powerful proprietary tool that has all the components required to
perform a complete statistical analysis

Model Building

The next step in the lifecycle is to build the model. Using various analytical tools and
techniques, you can manipulate the data with the goal of ‘discovering’ useful
information.

In this case, we want to predict the price of a 1.35-carat diamond. Using the pricing data
we have, we can plug it into a linear regression model to predict the price of a 1.35-
carat diamond.

Linear regression describes the relation between 2 variables - X and Y. After the
regression line is drawn, we can predict a Y value for an input X value using the
formula:

Y = mX + c

where,

m = Slope of the line


c = y-intercept

If you can validate that the model is working correctly, then you can go to the next
level—production. If not, you need to retrain the model with more data or use a newer
model or algorithm, and then repeat the process. You can quickly build models using
Python packages from libraries like Pandas, Matplotlib, or NumPy.

Communication

The next step is to get the key findings of the study and convey those to the
stakeholders. A good scientist should be able to communicate his findings to a
business-minded audience, including details about the steps taken to solve the problem.

Operationalize

Once all parties accept the findings, they get initiated. In this phase, the stakeholders
also get the final reports, code, and technical documents.

Career Options for a Data Scientist

The demand for data scientists is massive, but the supply is insufficient. With millions of
worldwide job openings, the role of a data scientist has become one of the hottest jobs
of the decade. While data science is present in all industries, the demand for data
science is exceptionally high in the technology, marketing, finance, healthcare, and
gaming industries. To know more about the career options available in data science,
check out this article on How to build a career in data science and consider enrolling for
the Data Science Certification Training Course.

Do you find data science a fascinating career field? Want to become part of the data
revolution, sweeping across industries worldwide? Check out Data Scientist Master’s
Program co-developed with IBM.
Find our Data Science Certification Training - R
Programming Online Classroom training classes in top
cities:

Name Date Place

Data Science Certification Training - R 19 Oct -17 Nov 2019, Weekend Your View
Programming batch City Details

Data Science Certification Training - R 26 Oct -24 Nov 2019, Weekend Chennai View
Programming batch Details

Data Science Certification Training - R 2 Nov -7 Dec 2019, Weekend Mumbai View
Programming batch Details

About the Author

Medono Zhasa
Medo specializes in writing for the digital space to garner social media attention and
increase search visibility. A writer by day and reader by night, Medo has a second life
writing Lord of the Rings fan theories and making cat videos for people of the Internet to
relish on.

View More
Recommended Courses

Data Science Certification Training - R Programming


11049 LEARNERS
Explore Course Category

Recommended Resources
What Do Data Scientists Earn? 2016 Data Science Salary Report
Article

What Skills Do I Need to Become a Data Scientist?


Article
A Day in the Life of a Data Scientist
Article

You might also like