You are on page 1of 14

Business Statistics

The Business Statistics and Analysis Specialization is designed to equip you with a
basic understanding of business data analysis tools and techniques. You’ll master
essential spreadsheet functions, build descriptive business data measures, and develop
your aptitude for data modeling. You’ll also explore basic probability concepts, including
measuring and modeling uncertainty, and you’ll use various data distributions, along
with the Linear Regression Model, to analyze and inform business decisions. The
Specialization culminates with a Capstone Project in which you’ll apply the skills and
knowledge you’ve gained to an actual business problem.

Types of Classification of Data in Statistics


Qualitative or Categorical Data
Qualitative data, also known as the categorical data, describes the data that fits into the
categories. Qualitative data are not numerical. The categorical information involves
categorical variables that describe the features such as a person’s gender, home town
etc. Categorical measures are defined in terms of natural language specifications, but
not in terms of numbers.
Sometimes categorical data can hold numerical values (quantitative value), but those
values do not have mathematical sense. Examples of the categorical data are birthdate,
favourite sport, school postcode. Here, the birthdate and school postcode hold the
quantitative value, but it does not give numerical meaning.

Nominal Data
Nominal data is one of the types of qualitative information which helps to label the
variables without providing the numerical value. Nominal data is also called the nominal
scale. It cannot be ordered and measured. But sometimes, the data can be qualitative
and quantitative. Examples of nominal data are letters, symbols, words, gender etc.
The nominal data are examined using the grouping method. In this method, the data are
grouped into categories, and then the frequency or the percentage of the data can be
calculated. These data are visually represented using the pie charts.

Ordinal Data
Ordinal data/variable is a type of data which follows a natural order. The significant
feature of the nominal data is that the difference between the data values is not
determined. This variable is mostly found in surveys, finance, economics,
questionnaires, and so on.
The ordinal data is commonly represented using a bar chart. These data are
investigated and interpreted through many visualisation tools. The information may be
expressed using tables in which each row in the table shows the distinct category.

Quantitative or Numerical Data


Quantitative data is also known as numerical data which represents the numerical value
(i.e., how much, how often, how many). Numerical data gives information about the
quantities of a specific thing. Some examples of numerical data are height, length, size,
weight, and so on. The quantitative data can be classified into two different types based
on the data sets. The two different classifications of numerical data are discrete data and
continuous data.

Discrete Data
Discrete data can take only discrete values. Discrete information contains only a finite
number of possible values. Those values cannot be subdivided meaningfully. Here,
things can be counted in the whole numbers.
Example: Number of students in the class

Continuous Data
Continuous data is data that can be calculated. It has an infinite number of probable
values that can be selected within a given specific range.
Example: Temperature range

Data Summarization
The term Data Summarization refers to presenting the summary of generated
data in an easily comprehensible and informative manner. Presenting the raw
data, (the data that was generated which is essentially the entire repertoire of
datasets- individual measurements) is not practical in many cases.

Tabular Presentation
A table helps to represent even a large amount of data in an engaging, easy to
read, and coordinated manner. The data is arranged in rows and columns.
This is one of the most popularly used forms of presentation of data as data
tables are simple to prepare and read.

Objectives Of Tabulation

 To simplify the complex data


 To bring out essential features of the data
 To facilitate comparison
 To facilitate statistical analysis
 Saving of space

Graphic Presentation
Graphic presentation represents a highly developed body of techniques for
elucidating, interpreting, and analyzing numerical facts by means of points,
lines, areas, and other geometric forms and symbols. Graphic techniques are
especially valuable in presenting quantitative data in a simple, clear, and
effective manner, as well as facilitating comparisons of values, trends, and
relationships. They have the additional advantages of succinctness and
popular appeal; the comprehensive pictures they provide can bring out hidden
facts and relationships and contribute to a more balanced understanding of a
problem.

Charts

Charts are a great way to visually represent all kinds of information, from the
simple to the very complex.

You can have a variety of data which can be used in presentations. Some of
these chart types include:

 Time Series
 Bar Charts
 Combo Charts
 Pie Charts
 Tables
 Geo Map
 Scorecard
 Scatter Charts
 Bullet Charts
 Area Chart
 Text & Images

Histogram

A histogram is used to summarize discrete or continuous data. In other


words, it provides a visual interpretation of numerical data by showing the
number of data points that fall within a specified range of values (called
“bins”). It is similar to a vertical bar graph. However, a histogram, unlike a
vertical bar graph, shows no gaps between the bars.
Frequency Distribution 

A frequency distribution is a representation, either in a graphical or tabular


format, that displays the number of observations within a given interval. The
interval size depends on the data being analysed and the goals of the analyst.
The intervals must be mutually exclusive and exhaustive.

RELATIVE FREQUENCY
The number of times an event occurs is called a frequency. Relative
frequency is an experimental one, but not a theoretical one. Since it is an
experimental one, it is possible to obtain different relative frequencies when
we repeat the experiments. To calculate the frequency, we need

 Frequency count for the total population


 Frequency count for a subgroup of the population
We can find the relative frequency probability in the following way if we know
the above two frequencies. The formula for a subgroup is;
Relative Frequency = Subgroup Count / Total Count

How to Calculate Relative Frequency?

The ratio of the number of times a value of the data occurs in the set of all
outcomes to the number of all outcomes gives the value of relative frequency.

Let’s understand the Relative Frequency formula with the help of an example

Let’s look at the table below to see how the weights of the people are
distributed.
Step 1: To convert the frequencies into relative frequencies, we need to do the
following steps.

Step 2: Divide the given frequency by the total N i.e., 40 in the above case
(Total sum of all frequencies).

Step 3: Divide the frequency by total number Let’s see how: 1/ 40 = 0.25.

Example: Let us solve a few more examples to understand the concepts


better.

This is a frequency table to see how many students have got marks between
given intervals in Maths.

Marks Frequency Relative Frequency

45 – 50 3 3 / 40 x 100 = 0.075

50 – 55 1 1 / 40 x 100 = 0.025

55 – 60 1 1 / 40 x 100 = 0.075

60 -65 6 6 / 40 x 100 = 0.15

65 – 70 8 8 / 40 x 100 = 0.2

70 – 80 3 3 / 40 x 100 = 0.275

80 -90 11 11 / 40 x 100 = 0.075

90 – 100 7 1 / 40 x 100 = 0.025

Measures of Central Tendency & Dispersion


Central tendency is a descriptive summary of a dataset through a single value
that reflects the centre of the data distribution. Along with the variability
(dispersion) of a dataset, central tendency is a branch of descriptive statistics.

The central tendency is one of the most quintessential concepts in statistics.


Although it does not provide information regarding the individual values in the
dataset, it delivers a comprehensive summary of the whole dataset.

Measures of Central Tendency

Generally, the central tendency of a dataset can be described using the


following measures:

Mean (Average): Represents the sum of all values in a dataset divided by the


total number of the values.

Median: The middle value in a dataset that is arranged in ascending order


(from the smallest value to the largest value). If a dataset contains an even
number of values, the median of the dataset is the mean of the two middle
values.

Mode: Defines the most frequently occurring value in a dataset. In some


cases, a dataset may contain multiple modes, while some datasets may not
have any mode at all.

Standard deviation: A standard deviation is a statistic that measures the


dispersion of a dataset relative to its mean. The standard deviation
is calculated as the square root of variance by determining each data point's
deviation relative to the mean. If the data points are further from the mean,
there is a higher deviation within the data set; thus, the more spread out the
data, the higher the standard deviation.

Variance: The term variance refers to a statistical measurement of the spread


between numbers in a data set. More specifically, variance measures how far
each number in the set is from the mean and thus from every other number in
the set. Variance is often depicted by this symbol: σ2. It is used by both
analysts and traders to determine volatility and market security. The square
root of the variance is the standard deviation (σ), which helps determine the
consistency of an investment’s returns over a period of time.

Even though the measures above are the most commonly used to define
central tendency, there are some other measures, including, but not limited
to, geometric mean, harmonic mean, midrange, and geometric median.

The selection of a central tendency measure depends on the properties of a


dataset. For instance, the mode is the only central tendency measure
for categorical data, while a median works best with ordinal data.

Although the mean is regarded as the best measure of central tendency for
quantitative data, that is not always the case. For example, the mean may not
work well with quantitative datasets that contain extremely large or extremely
small values. The extreme values may distort the mean. Thus, you may
consider other measures.

PROBABILITY DISTRIBUTION
The probability distribution is one of the major theories of statistical analysis.
It gives the possibility of achieving each outcome in a randomly given event.
The probabilities of all outcomes can be known through the probability
distribution. A tad bit of recalling of the probability theory can be of much help
to thoroughly understand probability distribution. Probability is one of the
phenomena that helps us measure the certainty or uncertainty of different
outcomes in a given event.

 PROBABILITY DISTRIBUTION TYPES :


A) Cumulative or Normal Probability Distribution

The cumulative probability distribution can otherwise be known as a


continuous probability distribution. Under this category, the set of all the
outcomes which can be achieved can have values on a continuous range. Let
us take the example of a set of real numbers, as they are continuous and all
the possible outcomes can also be real numbers. And in the same way,
complex numbers such as the whole number, prime numbers, etc., can also be
examples. But these are all mathematical examples.

We should also know some real-life examples of continuous probability


distributor. The temperature of the day can be considered as one of the real-
life examples of continuous probability. And after achieving the outcomes, a
distribution table can be made. Some other examples of the normal
probability distribution are rolling f a dice, judgments in the competitions,
sizes of female shoes, tossing of coins, range of weight of newborns,
population height of the world, etc.

B) Discrete or Binomial Probability Distribution


When the sets of outcomes are discrete, the distribution is known as discrete
probability Let’s say, for instance, that dice are rolled, hence, all the outcomes
that are achieved are discretely giving a mass of outcomes which is also
known as probability mass function. Some of the major examples of binomial
probability distributions can be- finding several used materials in a
manufacturing field, taking a survey of negative and positive feedbacks of
people on anything, a number of women and men in an organization,
calculating how many people watch a channel through survey, etc.

Example of a Probability Distribution

As a simple example of a probability distribution, let us look at the number


observed when rolling two standard six-sided dice. Each die has a 1/6
probability of rolling any single number, one through six, but the sum of two
dice will form the probability distribution depicted in the image below. Seven
is the most common outcome (1+6, 6+1, 5+2, 2+5, 3+4, 4+3). Two and twelve,
on the other hand, are far less likely (1+1 and 6+6).
CONTINUOUS DISTRIBUTION
 A continuous distribution is one in which data can take on any value within a
specified range (which may be infinite). A continuous distribution has an
infinite number of possible values, and the probability associated with any
particular value of a continuous distribution is null. Therefore, continuous
distributions are normally described in terms of probability density, which can
be converted into the probability that a value will fall within a certain range.

Example of a continuous distribution:

The continuous normal distribution can describe the distribution of weight of


adult males. For example, you can calculate the probability that a man weighs
between 160 and 170 pounds.
The shaded region under the curve in this example represents the range from
160 and 170 pounds. The area of this range is 0.136; therefore, the probability
that a randomly selected man weighs between 160 and 170 pounds is 13.6%.
The entire area under the curve equals 1.0.

However, the probability that X is exactly equal to some value is always zero
because the area under the curve at a single point, which has no width, is zero.
For example, the probability that a man weighs exactly 190 pounds to infinite
precision is zero. You could calculate a nonzero probability that a man weighs
more than 190 pounds, or less than 190 pounds, or between 189.9 and 190.1
pounds, but the probability that he weighs exactly 190 pounds is zero.

DISCRETE FUNCTIONS
A discrete distribution is a probability distribution that depicts the occurrence
of discrete (individually countable) outcomes, such as 1, 2, 3... or zero vs. one.
The binomial distribution, for example, is a discrete distribution that evaluates
the probability of a "yes" or "no" outcome occurring over a given number of
trials, given the event's probability in each trial—such as flipping a coin one
hundred times and having the outcome be "heads".

Distribution is a statistical concept used in data research. Those seeking to


identify the outcomes and probabilities of a particular study will chart
measurable data points from a data set, resulting in a probability distribution
diagram. There are many types of probability distribution diagram shapes that
can result from a distribution study, such as the normal distribution ("bell
curve").

Statisticians can identify the development of either a discrete or continuous


distribution by the nature of the outcomes to be measured. Unlike the normal
distribution, which is continuous and accounts for any possible outcome
along the number line, a discrete distribution is constructed from data that
can only follow a finite or discrete set of outcomes.

Discrete Distribution Example

Types of discrete probability distributions include:

 Poisson
 Bernoulli
 Binomial
 Multinomial

Example:

Consider an example where we are counting the number of people walking


into a store in any given hour. The values would need to be countable, finite,
non-negative integers. It would not be possible to have 0.5 people walk into a
store, and it would not be possible to have a negative amount of people walk
into a store. Therefore, the distribution of the values, when represented on a
distribution plot, would be discrete.

You might also like