You are on page 1of 22

STATISTICS

Statistics comprises useful data interpretation tools like mean, median, mode, standard
deviation, coefficient of variance, and sample tests. Raw financial data in a numerical format is
interpreted using mathematical formulas. Many sectors like science, government, manufacturing,
population, psychology, banking, and financial markets rely on statistical data.

“Statistics is extensively used to enhance Business performance through Analytics”

Statistics Explained

Statistics is the systematic processing and interpretation of raw data to compile a conclusive
result. These reports are drafted in a numerical format. They are presented in a succinct manner
so that one can read and understand easily. One should be able to comprehend them at a mere
glance.
Financial data is in a numerical format and includes details about portfolios, investments, and
assets. Historical data and present data are interpreted using mathematical formulas. Forecasts
are based on available information and requirements.

Application of Statistics
Statistics is indispensable for decision-making in various sectors and
verticals. It is applied in marketing, e-commerce, banking, finance,
human resource, production, and information technology. In addition,
this mathematical discipline has been a prominent part of research
and is widely used in data mining, medicine, aerospace, robotics,
psychology, and machine learning.

Not to forget the economics, government, and public sectors where


statistical data is a significant part of decision-making. For example, it
is used for public surveys, weather forecasts, sports scoring,
and budgeting.

What does statistics mean?


It is the science behind identifying, collecting, organizing,
summarizing, analyzing, interpreting, and presenting data. The data
could be either qualitative or quantitative. This mathematical decision
is employed in decision-making.

Why is statistics important?


Statistical data analysis plays a crucial role in scientific discoveries,
research, economic decisions, government budgeting, public welfare
activities, weather forecast, and stock analysis. In addition, this
mathematical discipline makes decision-making more objective.
It is an essential part of day-to-day life even. For instance, it is used in
schools and colleges to find the average percentage of the students.
Similarly, In households, it is used to determine the per-day expense.
Discuss the importance of Statistics in Business.

One can understand the importance of Statistics in business from the following:
(i) Marketing - Statistical analysis is frequently used in providing information for making
decisions in the field of marketing. It is necessary first to find out what can be sold and then to
evolve suitable strategy, so that the goods reach to the ultimate consumer. A skilful analysis of
data on production purchasing power, man power, habits of consumers, habits of consumer,
transportation cost should be considered to take any attempt to establish a new market.
(ii) Production - In the field of production statistical data and method play a very important role.
The decision about what to produce, how to produce, when to produce, and for whom to produce
is based largely on statistical analysis.
(iii) Finance - The financial organization discharging their finance function effectively depend
very heavily on statistical analysis of peat and tigers.
(iv) Banking - Banking Institute has found it increasingly necessary to establish research
department within their organization for the purpose of gathering and analysis information, not
only regarding their own business but also regarding the general economic situation and every
segment of business in which they may have interest.
(v) Investment - Statistics greatly assists investors in making clear and valued judgment in his
investment decision in selecting securities which are safe and have the best prospects of yielding
a good income.
(vi) Purchase - The purchasing department in discharging their function makes use of statistical
data to frame suitable purchase policies such as what to buy; What quantity to buy; What time to
buy; Where to buy; Whom to buy;
(vii) Accounting - Statistical data are also the employer in accounting particularly in auditing
function, the technique of sampling and destination is frequently used.
(viii) Control - The management control process combines statistical and accounting method in
making the overall budget for the coming year including sales, materials, labour and other costs
and net profits and capital requirement.

Marketing
As per Philip Kotler and Gary Armstrong marketing “ identifies customer needs and wants ,
determine which target markets the organisations can serve best, and designs appropriate
products, services and Programs to serve these markets”

Marketing is all about creating and growing customers profitably. Statistics is used in almost
every aspect of creating and growing customers profitably. Statistics is extensively used in
making decisions regarding how to sell products to customers. Also, intelligent use of
statistics helps managers to design marketing campaigns targeted at the potential customers.
Marketing research is the systematic and objective gathering, recording and analysis of data
about aspects related to marketing. IMRB international, TNS India, RNB Research, The
Nielson , Hansa Research and Ipsos Indica Research are some of the popular market research
companies in India. Web analytics is about the tracking of online behaviour of potential
customers and studying the behaviour of browsers to various websites.
Use of Statistics is indispensable in forecasting sales, market share and demand for various
types of Industrial products.

Factor analysis, conjoint analysis and multidimensional scaling are invaluable tools which
are based on statistical concepts, for designing of products and services based on customer
response.

Finance
Uncertainty is the hallmark of the financial world. All financial decisions are
based on “Expectation” that is best analysed with the help of the theory of
probability and statistical techniques. Probability and statistics are used
extensively in designing of new insurance policies and in fixing of premiums for
insurance policies. Statistical tools and technique are used for analysing risk and
quantifying risk, also used in valuation of derivative instruments, comparing
return on investment in two or more instruments or companies.
Beta of a stock or equity is a statistical tool for comparing volatility, and is
highly useful for selection of portfolio of stocks.
The most sophisticated traders in today’s stock markets are those who trade in
“derivatives” i.e financial instruments whose underlying price depends on the
price of some other asset.
Economics
Statistical data and methods render valuable assistance in the proper
understanding of the economic problem and the formulation of economic
policies. Most economic phenomena and indicators can be quantified and dealt
with statistically sound logic.
In fact, Statistics got so much integrated with Economics that it led to
development of a new subject called Econometrics which basically deals with
economics issues involving use of Statistics.
Operations
The field of operations is about transforming various resources into product and
services in the place, quantity, cost, quality and time as required by the
customers. Statistics plays a very useful role at the input stage through sampling
inspection and inventory management, in the process stage through statistical
quality control and six sigma method, and in the output stage through sampling
inspection. The term Six Sigma quality refers to situation where there is only
3.4 defects per million opportunities.
Human Resource Management or Development
Human Resource departments are inter alia entrusted with the responsibility of
evaluating the performance, developing rating systems, evolving compensatory
reward and training system, etc. All these functions involve designing forms,
collecting, storing, retrieval and analysis of a mass of data. All these functions
can be performed efficiently and effectively with the help of statistics.

What is Population?

In statistics, population is the entire set of items from which you draw data for a
statistical study. It can be a group of individuals, a set of items, etc. It makes up the data
pool for a study.

Generally, population refers to the people who live in a particular area at a specific time.
But in statistics, population refers to data on your study of interest. It can be a group of
individuals, objects, events, organizations, etc. You use populations to draw
conclusions.

An example of a population would be the entire student body at a school. It would


contain all the students who study in that school at the time of data collection.
Depending on the problem statement, data from each of these students is collected. An
example is the students who speak Hindi among the students of a school.

For the above situation, it is easy to collect data. The population is small and willing to
provide data and can be contacted. The data collected will be complete and reliable.

If you had to collect the same data from a larger population, say the entire country of
India, it would be impossible to draw reliable conclusions because of geographical and
accessibility constraints, not to mention time and resource constraints. A lot of data
would be missing or might be unreliable. Furthermore, due to accessibility issues,
marginalized tribes or villages might not provide data at all, making the data biased
towards certain regions or groups.
What is a Sample?

A sample is defined as a smaller and more manageable representation of a larger


group. A subset of a larger population that contains characteristics of that population. A
sample is used in statistical testing when the population size is too large for all members
or observations to be included in the test.

The sample is an unbiased subset of the population that best represents the whole
data.

To overcome the restraints of a population, you can sometimes collect data from a
subset of your population and then consider it as the general norm. You collect the
subset information from the groups who have taken part in the study, making the data
reliable. The results obtained for different groups who took part in the study can be
extrapolated to generalize for the population.

Population Sample

All residents who live above the


All residents of a country would constitute the Population set poverty line would be the
Sample

All residents above the poverty line in a country would be the All residents who are millionaires
Population would make up the Sample

Out of all the employees, all


All employees in an office would be the Population managers in the office would be
the Sample
4 Types of Data in Statistics

Qualitative and Quantitative Data


 
Qualitative data is a bunch of information that cannot be measured in the
form of numbers. It is also known as categorical data. It normally comprises
words, narratives, and we labelled them with names.
 
It delivers information about the qualities of things in data. The outcome of
qualitative data analysis can come in the type of featuring key words,
extracting data, and ideas elaboration.
1. Nominal Data
 
Nominal data are used to label variables where there is no quantitative
value and has no order. So, if you change the order of the value then the
meaning will remain the same.
 
Thus, nominal data are observed but not measured, are unordered but non-
equidistant, and have no meaningful zero.
 
The only numerical activities you can perform on nominal data is to state
that perception is (or isn't) equivalent to another (equity or inequity), and
you can use this data to amass them. 
 
You can't organize nominal data, so you can't sort them.
 
Neither would you be able to do any numerical tasks as they are saved
for numerical data. With nominal data, you can calculate frequencies,
proportions, percentages, and central points.
 
Examples of Nominal data:
 
 What languages do you speak?
 

 English
 German
 French
 Punjabi

 
 What’s your nationality?
 

 American
 Indian
 Japanese
 German

 
You can clearly see that in these examples of nominal data the
categories have no order.
 
2. Ordinal Data
 
Ordinal data is almost the same as nominal data but not in the case of
order as their categories can be ordered like 1st, 2nd, etc. However, there
is no continuity in the relative distances between adjacent categories.
 
Ordinal Data is observed but not measured, is ordered but non-
equidistant, and has no meaningful zero. Ordinal scales are always used
for measuring happiness, satisfaction, etc.
 
With ordinal data, likewise, with nominal data, you can amass the
information by evaluating whether they are equivalent or extraordinary. 
 
As ordinal data are ordered, they can be arranged by making basic
comparisons between the categories, for example, greater or less than,
higher or lower, and so on.
 
You can't do any numerical activities with ordinal data, however, as they
are numerical data.
 
With ordinal data, you can calculate the same things as nominal data like
frequencies, proportions, percentage, central point but there is one more
point added in ordinal data that is summary statistics and
similarly bayesian statistics.
 
Examples of Ordinal data:
 
 Opinion
o Agree
o Disagree
o Mostly agree
o Neutral
o Mostly disagree

 
 Time of day
o Morning
o Noon
o Night

 
In these examples, there is an obvious order to the categories.
 
3. Interval Data
 
Interval Data are measured and ordered with the nearest items but have
no meaningful zero.
 
The central point of an Interval scale is that the word 'Interval' signifies
'space in between', which is the significant thing to recall,  interval scales
not only educate us about the order but additionally about the value
between every item. 
 
Interval data can be negative, though ratio data can't.
 
Even though interval data can show up fundamentally the same as ratio
data, the thing that matters is in their characterized zero-points. If the
zero-point of the scale has been picked subjectively, at that point the
data can't be ratio data and should be interval data.
 
Hence, with interval data you can easily correlate the degrees of the data
and also you can add or subtract the values.
 
There are some descriptive statistics that you can calculate for interval
data are central point (mean, median, mode), range (minimum,
maximum), and spread (percentiles, interquartile range, and standard
deviation). 
 
In addition to that, similar other statistical data analysis techniques can
be used for more analysis.
 
Examples of Interval data:
 
 Temperature (°C or F, but not Kelvin)
 Dates (1066, 1492, 1776, etc.)
 Time interval on a 12-hour clock (6 am, 6 pm)
 
4. Ratio Data
 
Ratio Data are measured and ordered with equidistant items and a
meaningful zero and never be negative like interval data.
 
An outstanding example of ratio data is the measurement of heights. It
could be measured in centimetres, inches, meters, or feet and it is not
practicable to have a negative height. 
 
Ratio data enlightens us regarding the order for variables, the contrasts
among them, and they have absolutely zero. It permits a wide range of
estimations and surmisings to be performed and drawn. 
 
Ratio data is fundamentally the same as interval data, aside from zero
means none.
 
The descriptive statistics which you can calculate for ratio data are the
same as interval data which are central point (mean, median, mode),
range (minimum, maximum), and spread (percentiles, interquartile range,
and standard deviation).
 
Example of Ratio data:
 
 Age (from 0 years to 100+)
 Temperature (in Kelvin, but not °C or F)
 Distance (measured with a ruler or any other assessing device)
 Time interval (measured with a stop-watch or similar)
 
Therefore, for these examples of ratio data, there is an actual, meaningful
zero-point like the age of a person, absolute zero, distance calculated
from a specified point or time all have real zeros.
MEANING OF MEASURES OF CENTRAL TENDENCY

A measure of central tendency describes a summary measure That tries to


spell out an entire set of information using one value that reflects the center
or center of its supply.

Chaplin (1975) defines central tendency as the representative value of the


distribution of scores.

English & English (1958) define measure of central tendency as a statistic


calculated from a set of distinct and independent observations and
measurements of certain items or entity and intend to typify those
observations.

For example, when we talk about the achievement scores of the students of a
class, we find some students with very high or very low score. However, the
score of the most students live somewhere between the highest and the
lowest scores of the whole class. Here we see a score around which the data
converge around and this will be used as a measure of central tendency.

Mean:
Mean is the average of all values given in both discrete and continuous
distribution. It is calculated differently in both discrete distribution and the
continuous distribution. In the discrete data, all the scores are added and
divided by the total number. In the continuous distribution, there are different
methods to calculate the mean.
Median:
The Median is the middle value in a series of data. In the discrete data, when
the totals of the list are odd, the median is the middle entry in the list after
sorting the list into increasing order.

When the totals of the list are even, the median is equal to the sum of the two
middle (after sorting the list into increasing order) numbers divided by two. In
the continuous distribution, some different formula is applied.

Also read | Testing and Measurement Concepts of Assessment.

Mode:
The mode in a list of numbers refers to the list of numbers that occur most
frequently. For example, in the following data —7, 2, 2, 43, 11, 11, 44, 18, 18,
18, 27, 39, 6 -18 occurs the most at 3 times. There can be more than one
mode for a distribution with a discrete random variable.

A distribution with two modes is called bimodal and a distribution with three
modes is called trimodal. The mode of a distribution with a continuous
random variable is calculated differently.

Range:
The range of a set of data is the difference between the largest and smallest
values. However, in descriptive statistics, this concept of range has a more
complex meaning.
It is the same in discrete random variable series and continuous random
variable series.

CHARACTERISTICS OF GOOD MEASURES OF


CENTRAL TENDENCY

Rightly and rigidly defined: It implies that the definition of the measure should be

so clear and same to everyone. It should be interpreted in the same manner.


Simple to calculate: It should lead to one interpretation whoever may be

calculating it.
Easy to understand: It should be simple to calculate. Too much complexity and

high calculations do not make the measure a good one.


Based on all the observations: It means whatever may be the measure of central

tendency, it must be easily understood what it conveys.

Characteristics of a Good Measure of Dispersion


 It should be easy to calculate and simple to understand.
 It should be based on all the observations of the series.
 It should be rigidly defined.
 It should not be affected by extreme values.
 It should not be unduly affected by sampling fluctuations.

Measures of Dispersion
A measure of dispersion indicates the scattering of data. It explains the disparity of data from one
another, delivering a precise view of their distribution. The measure of dispersion displays and gives
us an idea about the variation and the central value of an individual item.

In other words, dispersion is the extent to which values in a distribution differ from the average of
the distribution. It gives us an idea about the extent to which individual items vary from one another,
and from the central value.
What is Probability Distribution?
Probability distribution yields the possible outcomes for any random event. It is also defined based
on the underlying sample space as a set of possible outcomes of any random experiment. These
settings could be a set of real numbers or a set of vectors or a set of any entities. It is a part of
probability and statistics.

Random experiments are defined as the result of an experiment, whose outcome cannot be
predicted. Suppose, if we toss a coin, we cannot predict, what outcome it will appear either it will
come as Head or as Tail. The possible result of a random experiment is called an outcome. And the
set of outcomes is called a sample point. With the help of these experiments or events, we can
always create a probability pattern table in terms of variables and probabilities.

Types of Probability Distribution


There are two types of probability distribution which are used for different purposes and various
types of the data generation process.


o
1. Normal or Cumulative Probability Distribution
2. Binomial or Discrete Probability Distribution

Let us discuss now both the types along with their definition, formula and examples.

Cumulative Probability Distribution


The cumulative probability distribution is also known as a continuous probability distribution. In this
distribution, the set of possible outcomes can take on values in a continuous range.

For example, a set of real numbers, is a continuous or normal distribution, as it gives all the possible
outcomes of real numbers. Similarly, a set of complex numbers, a set of prime numbers, a set of
whole numbers etc. are examples of Normal Probability distribution. Also, in real-life scenarios, the
temperature of the day is an example of continuous probability. Based on these outcomes we can
create a distribution table. A probability density function describes it. The formula for the normal
distribution is;
Where,


o

 μ = Mean Value
 σ = Standard Distribution of probability.
 If mean(μ) = 0 and standard deviation(σ) = 1, then this distribution is known
to be normal distribution.
 x = Normal random variable

Normal Distribution Examples


Since the normal distribution statistics estimates many natural events so well, it has evolved into a
standard of recommendation for many probability queries. Some of the examples are:


o

 Height of the Population of the world


 Rolling a dice (once or multiple times)
 To judge the Intelligent Quotient Level of children in this competitive world
 Tossing a coin
 Income distribution in countries economy among poor and rich
 The sizes of females shoes
 Weight of newly born babies range
 Average report of Students based on their performance
Continuous Distributions
Continuous distributions are characterized by an infinite number of possible outcomes, together with the probability
of observing a range of these outcomes. In the following example, there are an infinite number of possible operation
times between the values 2.0 minutes and 8.0 minutes. Twenty percent of the time the operation will take from 2.0
to 3.5 minutes, 40% of the time the operation will take from 3.5 to 5.0 minutes, 30% of the time the operation will
take from 5.0 to 6.0 minutes, and 10% of the time the operation will take from 6.0 minutes to 8.0 minutes.

What is Normal Distribution?

The normal distribution is also referred to as Gaussian or Gauss


distribution. The distribution is widely used in natural and social sciences.
It is made relevant by the Central Limit Theorem, which states that the
averages obtained from independent, identically distributed random
variables tend to form normal distributions, regardless of the type of
distributions they are sampled from.

A Poisson distribution is a discrete probability distribution. It gives the probability of an


event happening a certain number of times (k) within a given interval of time or space.

The Poisson distribution has only one parameter, λ (lambda), which is the mean number


of events. 

What is a Poisson distribution?


A Poisson distribution is a discrete probability distribution, meaning that it gives the
probability of a discrete (i.e., countable) outcome. For Poisson distributions, the discrete
outcome is the number of times an event occurs, represented by k.

You can use a Poisson distribution to predict or explain the number of events occurring
within a given interval of time or space. “Events” could be anything from disease cases
to customer purchases to meteor strikes. The interval can be any specific amount of
time or space, such as 10 days or 5 square inches.

You can use a Poisson distribution if:

1. Individual events happen at random and independently. That is, the probability of one
event doesn’t affect the probability of another event.
2. You know the mean number of events occurring within a given interval of time or space.
This number is called λ (lambda), and it is assumed to be constant.
When events follow a Poisson distribution, λ is the only thing you need to know to
calculate the probability of an event occurring a certain number of times.

Sampling Distribution
 
Sampling Distribution in the field of statistics is a subtype of proportion
distribution wherein a statistic is calculated by randomly analyzing samples
from a given population. It is the distribution of samples in a population that
leads to the revelation of data in numerous fields. 
 
Even though the sampling distribution does not include any sample that
deviates far off from the population's mean value, the frequency distribution
of sampling distribution often generates a normal distribution with maximum
samples close to the population's mean value. 
 
Types of Sampling Distribution

1. Sampling Distribution of Mean


 
The first and foremost type of sampling distribution is of the mean. This
type focuses on calculating the mean average of all sample means which
then lead to sampling distribution. 
 
The average of every sample is put together and a sampling distribution
mean is calculated which reflects the nature of the whole population.
 
 With more samples, the standard deviation decreases which leads to a
normal frequency distribution or a bell-shaped curve on the graph. 
 
 
2. Sampling Distribution of Proportion
 
When it comes to the second type of Sampling Distribution, the
population's samples are calculated to obtain the proportions of a
population. Herein, the mean of all sample proportions is calculated, and
thereby the sampling distribution of proportion is generated. 
 
As the proportion of a population is defined by a part of the population
that possesses a certain attribute, the sampling distribution of proportion
aims to achieve a mean of all sample proportions that involve the whole
population.
 
 
3. T-Distribution
 
Third of all, T-Sampling Distribution is considered to involve a small size
of the population that gives about no information about standard
deviation. Under this type of sampling distribution, the population size is
very small that, in turn, leads to a normal distribution. 
 
The frequent distribution in this type is the most near to the mean of the
sampling distribution. Only a handful of samples are far off from the
mean value of the whole population. 

You might also like