DSUR Notes-1

Data Science Using R
Types of Data:
“Data is the new oil.” Today data is everywhere in every field. Whether you are a
data scientist, marketer, businessman, data analyst, researcher, or you are in any
other profession, you need to play or experiment with raw or structured data. This
data is so important for us that it becomes important to handle and store it
properly, without any error. While working on these data, it is important to know the
types of data to process them and get the right results. There are two types of
data: Qualitative and Quantitative data, which are further classified into:
The data is classified into four categories:
• Nominal data.
• Ordinal data.
• Discrete data.
• Continuous data.
Now business runs on data, and most companies use data for their insights to
create and launch campaigns, design strategies, launch products and services or
try out different things. According to a report, today, at least 2.5 quintillion bytes of
data are produced per day.
Qualitative or Categorical Data
Qualitative or Categorical Data is data that can’t be measured or counted in the

form of numbers. These types of data are sorted by category, not by number. That’s
why it is also known as Categorical Data. These data consist of audio, images,
symbols, or text. The gender of a person, i.e., male, female, or others, is qualitative
data.
Qualitative data tells about the perception of people. This data helps market
researchers understand the customers’ tastes and then design their ideas and
strategies accordingly.
Nominal Data
Nominal Data is used to label variables without any order or quantitative value. The
colour of hair can be considered nominal data, as one colour can’t be compared
with another colour.
The name “nominal” comes from the Latin name “nomen,” which means “name.”
With the help of nominal data, we can’t do any numerical tasks or can’t give any
order to sort the data. These data don’t have any meaningful order; their values are
distributed into distinct categories.
Examples of Nominal Data:
• Colour of hair (Blonde, red, Brown, Black, etc.)
• Marital status (Single, Widowed, Married)
• Nationality (Indian, German, American)
• Gender (Male, Female, Others)
• Eye Color (Black, Brown, etc.)
Ordinal Data
Ordinal data have natural ordering where a number is present in some kind of order
by their position on the scale. These data are used for observation like customer
satisfaction, happiness, etc., but we can’t do any arithmetical tasks on them.
Ordinal data is qualitative data for which their values have some kind of relative
position. These kinds of data can be considered “in-between” qualitative and
quantitative data. The ordinal data only shows the sequences and cannot use for
statistical analysis. Compared to nominal data, ordinal data have some kind of
order that is not present in nominal data.
Examples of Ordinal Data :
• When companies ask for feedback, experience, or satisfaction on a

scale of 1 to 10
• Letter grades in the exam (A, B, C, D, etc.)
• Ranking of people in a competition (First, Second, Third, etc.)
• Economic Status (High, Medium, and Low)
• Education Level (Higher, Secondary, Primary)
Quantitative Data
Quantitative data can be expressed in numerical values, making it countable and

including statistical data analysis. These kinds of data are also known as Numerical
data. It answers the questions like “how much,” “how many,” and “how often.” For
example, the price of a phone, the computer’s ram, the height or weight of a person,
etc., falls under quantitative data.
Quantitative data can be used for statistical manipulation. These data can be
represented on a wide variety of graphs and charts, such as bar graphs,
histograms, scatter plots, boxplots, pie charts, line graphs, etc.
Examples of Quantitative Data:
• Height or weight of a person or object
• Room Temperature
• Scores and Marks (Ex: 59, 80, 60, etc.)
• Time
The Quantitative data are further classified into two parts :
1) Discrete Data
The term discrete means distinct or separate. The discrete data contain the values
that fall under integers or whole numbers. The total number of students in a class
is an example of discrete data. These data can’t be broken into decimal or fraction
values.
The discrete data are countable and have finite values; their subdivision is not
possible. These data are represented mainly by a bar graph, number line, or
frequency table.
Examples of Discrete Data:
• Total numbers of students present in a class
•Cost of a cell phone
• Numbers of employees in a company
• The total number of players who participated in a competition
• Days in a week
2)Continuous Data
Continuous data are in the form of fractional numbers. It can be the version of an
android phone, the height of a person, the length of an object, etc. Continuous data
represents information that can be divided into smaller levels. The continuous
variable can take any value within a range.
The key difference between discrete and continuous data is that discrete data
contains the integer or whole number. Still, continuous data stores the fractional
numbers to record different types of data such as temperature, height, width, time,
speed, etc.
Examples of Continuous Data:
• Height of a person
• Speed of a vehicle
• “Time-taken” to finish the work
• Wi-Fi Frequency
• Market share price
Analytics
What does Analytics Mean?
Analytics is the scientific process of discovering and communicating the

meaningful patterns which can be found in data. It is concerned with turning raw
data into insight for making better decisions. Analytics relies on the application of
statistics, computer programming, and operations research in order to quantify
and gain insight to the meanings of data. It is especially useful in areas which
record a lot of data or information.
Analytics provides us with meaningful information which may otherwise be hidden

from us within large quantities of data. It is something that any leader, manager or
just about anyone can make use of especially in today’s data-driven word.
Information has long been considered as a great weapon, and analytics is the
forge that creates it. Analytics changes everything, not just in the world of business,
but also in science, sports, health care and just about any field where vast amounts
of data are collected.
Analytics leads us to find the hidden patterns in the world around us, from
consumer behaviors, athlete and team performance, to finding connections
between activities and diseases. This can change how we look at the world, and
usually for the better. Sometimes we think that a process is already working at its
best, but sometimes data tells us otherwise, so analytics helps us to improve our
world.
In the world of business, organizations would usually apply analytics in order to

describe, predict and then improve the business performance of the company.
Specifically it would help in the following areas:
• Web analytics
• Fraud analysis
• Risk analysis
• Advertisement and marketing
• Enterprise decision management
• Market optimization
• Market modelling
Types of Analytics
As organizations collect more data, what they use it for and how they analyze and
interpret that data becomes more nuanced. Data without analytics doesn’t make
much sense, but analytics is a broad term that can mean a lot of different things
depending on where you sit on the data analytics maturity model.
Modern analytics tend to fall in four distinct categories: descriptive, diagnostic,

predictive, and prescriptive. How do you know which kind of analytics you should
use, when you should use it, and why?
Understanding the what, why, when, where, and how of your data analytics helps
to drive better decision making and enables your organization to meet its business
objectives.
Four Types of Analytics

What is Descriptive Analytics?
Descriptive analytics answer the question, “What happened?”. This type of analytics
is by far the most commonly used by customers, providing reporting and analysis
centered on past events. It helps companies understand things such as:
• How much did we sell as a company?

• What was our overall productivity?
• How many customers churned in the last quarter?
Descriptive analytics is used to understand the overall performance at an

aggregate level and is by far the easiest place for a company to start as data tends
to be readily available to build reports and applications.
It’s extremely important to build core competencies first in descriptive analytics

before attempting to advance upward in the data analytics maturity model. Core
competencies include things such as:
• Data modeling fundamentals and the adoption of basic star schema best
practices,
• Communicating data with the right visualizations, and
• Basic dashboard design skills.
What is Diagnostic Analytics?
Diagnostic analytics, just like descriptive analytics, uses historical data to answer a
question. But instead of focusing on “the what”, diagnostic analytics addresses the
critical question of why an occurrence or anomaly occurred within your data.
Diagnostic analytics also happen to be the most overlooked and skipped step
within the analytics maturity model. Anecdotally, I see most customers attempting
to go from “what happened” to “what will happen” without ever taking the time to
address the “why did it happen” step. This type of analytics helps companies
answer questions such as:
• Why did our company sales decrease in the previous quarter?
• Why are we seeing an increase in customer churn?
• Why are a specific basket of products vastly outperforming their prior year
sales figures?
Diagnostic analytics tends to be more accessible and fit a wider range of use cases
than machine learning/predictive analytics. You might even find that it solves
some business problems you earmarked for predictive analytics use cases.
What is Predictive Analytics?
Predictive analytics is a form of advanced analytics that determines what is likely

to happen based on historical data using machine learning. Historical data that
comprises the bulk of descriptive and diagnostic analytics is used as the basis of
building predictive analytics models. Predictive analytics helps companies address
use cases such as:
• Predicting maintenance issues and part breakdown in machines.
• Determining credit risk and identifying potential fraud.
• Predict and avoid customer churn by identifying signs of customer

dissatisfaction.
What is Prescriptive Analytics?
Prescriptive analytics is the fourth, and final pillar of modern analytics. Prescriptive
analytics pertains to true guided analytics where your analytics is prescribing or
guiding you toward a specific action to take. It is effectively the merging of
descriptive, diagnostic, and predictive analytics to drive decision making. Existing
scenarios or conditions (think your current fleet of freight trains) and the
ramifications of a decision or occurrence (parts breakdown on the freight trains)
are applied to create a guided decision or action for the user to take (proactively
buy more parts for preventative maintenance).
Prescriptive analytics requires strong competencies in descriptive, diagnostic, and

predictive analytics which is why it tends to be found in highly specialized industries
(oil and gas, clinical healthcare, finance, and insurance to name a few) where use
cases are well defined. Prescriptive analytics help to address use cases such as:
• Automatic adjustment of product pricing based on anticipated customer

demand and external factors.
• Flagging select employees for additional training based on incident reports

in the field.
Prescriptive analytics primary aim is to take the educated guess or assessment out
of data analytics and streamline the decision-making process.
Roles and Responsibilities of a Data Scientist
Data Scientists collect data and explore, analyze, and visualize it. They apply
mathematical and statistical models to find patterns and solutions in the data.
Basic Skills of a Data Scientist
A Data Scientist should be able to

•Ask the right questions
•Understand data structure
•Interpret and wrangle data
•Apply statistical and mathematical methods
•Visualize data and communicate with stakeholders
•Work as a team player
Roles and Responsibilities of a Data Scientist

• Identify valuable data sources and automate collection processes
• Undertake preprocessing of structured and unstructured data
• Analyze large amounts of information to discover trends and patterns
• Build predictive models and machine-learning algorithms
• Combine models through ensemble modeling
• Present information using data visualization techniques
• Propose solutions and strategies to business challenges
• Collaborate with engineering and product development teams
What Is Hypothesis Testing in Statistics?

Hypothesis Testing is a type of statistical analysis in which you put your
assumptions about a population parameter to the test. It is used to estimate the
relationship between 2 statistical variables.
Let's discuss few examples of statistical hypothesis from real-life -
• A teacher assumes that 60% of his college's students come from lower-
middle-class families.
• A doctor believes that 3D (Diet, Dose, and Discipline) is 90% effective for
diabetic patients.
Now that you know about hypothesis testing, look at the two types of hypotheses
testing in statistics.
How Hypothesis Testing Works?

An analyst performs hypothesis testing on a statistical sample to present evidence
of the plausibility of the null hypothesis. Measurements and analyses are
conducted on a random sample of the population to test a theory. Analysts use a
random population sample to test two hypotheses: the null and alternative
hypotheses.
The null hypothesis is typically an equality hypothesis between population
parameters; for example, a null hypothesis may claim that the population means
return equals zero. The alternate hypothesis is essentially the inverse of the null
hypothesis (e.g., the population means the return is not equal to zero). As a result,
they are mutually exclusive, and only one can be correct. One of the two
possibilities, however, will always be correct.
Null Hypothesis and Alternate Hypothesis

The Null Hypothesis is the assumption that the event will not occur. A null hypothesis
has no bearing on the study's outcome unless it is rejected.
H0 is the symbol for it, and it is pronounced H-naught.
The Alternate Hypothesis is the logical opposite of the null hypothesis. The
acceptance of the alternative hypothesis follows the rejection of the null
hypothesis. H1 is the symbol for it.
Let's understand this with an example.
A sanitizer manufacturer claims that its product kills 95 percent of germs on
average.
To put this company's claim to the test, create a null and alternate hypothesis.
H0 (Null Hypothesis): Average = 95%.
Alternative Hypothesis (H1): The average is less than 95%.
Another straightforward example to understand this concept is determining
whether or not a coin is fair and balanced. The null hypothesis states that the
probability of a show of heads is equal to the likelihood of a show of tails. In contrast,
the alternate theory states that the probability of a show of heads and tails would
be very different.
One-Tailed and Two-Tailed Hypothesis Testing
The One-Tailed test, also called a directional test, considers a critical region of data
that would result in the null hypothesis being rejected if the test sample falls into it,
inevitably meaning the acceptance of the alternate hypothesis.
In a one-tailed test, the critical distribution area is one-sided, meaning the test
sample is either greater or lesser than a specific value.
In two tails, the test sample is checked to be greater or less than a range of values
in a Two-Tailed test, implying that the critical distribution area is two-sided.
If the sample falls within this range, the alternate hypothesis will be accepted, and
the null hypothesis will be rejected.
Type 1 and Type 2 Error

A hypothesis test can result in two types of errors.
Type 1 Error: A Type-I error occurs when sample results reject the null hypothesis
despite being true.
Type 2 Error: A Type-II error occurs when the null hypothesis is not rejected when it
is false, unlike a Type-I error.
Example:
Suppose a teacher evaluates the examination paper to decide whether a student
passes or fails.
H0: Student has passed
H1: Student has failed
Type I error will be the teacher failing the student [rejects H0] although the student
scored the passing marks [H0 was true].
Type II error will be the case where the teacher passes the student [do not reject
H0] although the student did not score the passing marks [H1 is true].

DSUR Notes-1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSUR Notes-1

Uploaded by

Copyright:

Available Formats

Data Science Using R

The data is classified into four categories:

Qualitative or Categorical Data is data that can’t be measured or counted in the

Examples of Ordinal Data :

• When companies ask for feedback, experience, or satisfaction on a

Quantitative data can be expressed in numerical values, making it countable and

The Quantitative data are further classified into two parts :

Analytics is the scientific process of discovering and communicating the

Analytics provides us with meaningful information which may otherwise be hidden

In the world of business, organizations would usually apply analytics in order to

Modern analytics tend to fall in four distinct categories: descriptive, diagnostic,

Four Types of Analytics

• How much did we sell as a company?

Descriptive analytics is used to understand the overall performance at an

It’s extremely important to build core competencies first in descriptive analytics

What is Diagnostic Analytics?

• Why did our company sales decrease in the previous quarter?

• Why are we seeing an increase in customer churn?

What is Predictive Analytics?

Predictive analytics is a form of advanced analytics that determines what is likely

• Predicting maintenance issues and part breakdown in machines.

• Determining credit risk and identifying potential fraud.

• Predict and avoid customer churn by identifying signs of customer

What is Prescriptive Analytics?

Prescriptive analytics requires strong competencies in descriptive, diagnostic, and

• Automatic adjustment of product pricing based on anticipated customer

• Flagging select employees for additional training based on incident reports

Roles and Responsibilities of a Data Scientist

Basic Skills of a Data Scientist

A Data Scientist should be able to

Roles and Responsibilities of a Data Scientist

What Is Hypothesis Testing in Statistics?

How Hypothesis Testing Works?

Null Hypothesis and Alternate Hypothesis

Type 1 and Type 2 Error

You might also like