You are on page 1of 69

BUSINESS STATISTICS

MODULE : 1
BASIC CONCEPTS

Prepared by : Bhumi
Vyas
(Asst. Prof.)
CHAPTER 1 – Basics of Statistics
• CONTENTS

 Basics of Statistics
 Function of Statistics
 Scope of Statistics
 Limitation of Statistics
 Basic Statistical Concepts
Basics of statistics
• Introduction:
Nature created variation and thereby
generated the need for the subject of statistics which
essentially exists only because of variation in data.
Ex: Height, Weight, Marks, Percentage, Income,
Sales of the companies, Prices of Stocks.
• Statistics is the science of dealing with numbers.
Example:
1.I walk on an average 4 km/day.
2.There are 60 percent chances that a particular political party
will get reelected at the next election.

• It is used for Collection, Summarization, Presentation and


Analysis of data.
• The word “Statistics” is used in two senses.
• In the plural sense: as merely collection of data or figures
like agriculture statistics, banking statistics, income-tax
statistics.
• In the singular sense: as a subject like Economics,
Mathematics, etc.
DEFINITION
• A collection of methods for planning experiments, obtaining
data
• organizing, summarizing, presenting, analyzing, interpreting
• drawing conclusions based on the data.
• Statistics may be defined as: “the subject concerned with
scientific method for collecting, summarizing, presenting &
analyzing data as well as drawing conclusions or making
prediction on the basis of such analysis.”
FUNCTIONS OF STATISTICS

• It represents data in a definite form:


 It makes statements which are precise and in quantitative
terms.
 India is a overpopulated country, or that a country has high
level of poverty are all general statements but do not convey
any precise meaning.
 Words such as high, low, good, bad, are unclear and are
interpreted differently by different people.
 Statements of facts made in exact quantitative terms are more
convincing.
• It simplifies complex data or a mass of figures
 Statistics not only helps one to express data in a
concrete definite form but also reduces the data to a few
significant figures which give the essence of the issue
under consideration.

• It facilitates comparison
 Facts by themselves have little value unless they are
seen in the correct context in which they occur.
 The purpose of statistics is to enable comparison
between past and present results to ascertain the reasons
for changes, which have taken place and the effect of
such changes in future.
• It helps in forecasting
 The future is uncertain. Statistics helps in forecasting the
trend and tendencies. Statistical techniques are used for
predicting the future values of a variable. Eg.
Regression line
 A producer forecasts his future production on the basis
of the present demand conditions and his past
experiences.

• It helps in formulating policies


 Statistics helps in formulating plans and policies in
different fields. Statistical analysis of data forms the
beginning of policy formulations.
 Hence, statistics is essential for planners, economists,
scientists and administrators to prepare different plans
and programs.
SCOPE OF STATISTICS

• There has been hardly any area where statistics has not been
applied, whether it be trade, industry, commerce, economics,
life sciences, education.
• However certain fields have used statistics very frequently and
effectively. We list some of the important fields.

 State
 Economics
 Business Management
STATISTICS & STATE

• Statistics have been used by governments in framing policies


on the basis of data about population, military, crimes,
education etc..
• The present day governments have special departments which
maintain a variety of data of significance to the state.
• Besides Central & State Governments, other departments such
as Central Statistical Organization (CSO), National Sample
Survey Organization (NSSO) and the Registrar General of
India (RGI), regularly collect data for the purpose of
analyzing effectiveness of various policies of government.
STATISTICS & ECONOMICS

 Statistics play an important role in economics. Economics largely


depends upon statistics. National income accounts are
multipurpose indicators for the economists and administrators.
 In economics research statistical methods are used for collecting
and analysis the data and testing hypothesis.
 The relationship between supply and demands is studies by
statistical methods, the imports and exports, the inflation rate, the
per capita income are the problems which require good knowledge
of statistics.
STATISTICS & BUSINESS

 Statistics play an important role in business. A successful


businessman must be very quick and accurate in decision
making.
 He knows that what his customers wants, he should therefore,
know what to produce and sell and in what quantities.
 Statistics helps businessman to plan production according to
the taste of the costumers, the quality of the products can also
be checked more efficiently by using statistical methods.
 So all the activities of the businessman based on
statistical information.
 He can make correct decision about the location of
business, marketing of the products, financial resources
etc…
 In all areas of business, statistics is widely used such as:
o Marketing
o Production
o Finance
BASIC STATISTICAL CONCEPTS

• Data
• The word data is plural of the word datum which in Greek
means fact.
• It is a collection of observations expressed in numerical
quantities.
• Any raw collection of facts and figures which is not
meaningful to user is know as Data.
• Data is always used in the collective sense and not in singular.
• Population
• The word population in statistics means the totality of
the set of objects under study.
• Statistical populations are used in order to observe
behaviors, trends, and patterns in the way individuals in
a defined group interact with the world around them,
allowing statisticians to draw conclusions about the
characteristics of the subjects of study
• An example of population is over eight million people
living in New York City.
• Sample
• A subset of the population.
• A sample is a selected number of entities or individuals
which form a part of the population under study.
• The study of a sample is more practical and economical
in most situations where the population is large and is
used to make conclusions about the entire population.
• Characteristics
• The word characteristic means an aspect possessed by
an individual entity.
• We may study the rainfall of a certain region, or the
marks scored by students in a certain school. These are
referred to as characteristics.
• Variable & Attributes
• In statistics characteristics are of two types. Measurable
and non measurable.

• Measurable characteristics are those that can be


quantified as expressed in numerical terms. The
measurable characteristics are known as variables.

• A non-measurable characteristic is qualitative in nature


and cannot be quantified. Such a characteristic e.g
nationality, religion, etc are called as attributes.
Data Collection Method:
Primary Method:
• Primary data are those which are collected directly from
the field of enquiry for a specific purpose. This is raw
data original in nature and directly collected from the
population. The collection of the data can be made
through two methods.
a) census method : census method is a study of the entire
population and data are collected about each individual
of the population.
b) Sampling survey methods: Sampling survey method is
study of representative of population (sample)
Collection of Primary Data:
• By direct personal observation:
This can be done by meeting and interrogating
people who may supply the desired information.
• By indirect oral investigation:
the information is collected not by
questioning the concerned people but by asking
people connected with the concerned people.
These people can be called as witnesses who have
knowledge about the persons concerned or
situation involved.
• By sending questionnaires by mail or
email:
A collection of questions relevant to the
area of study is created and sent by post or email
to selected people with a request to fill up and
returned by post or email.
• By sending schedules through paid
investigation:
A schedule is a form where information is
to be noted by person who asked question. The
investigators are to fill up the schedules on the
basis of answers given by the respondents.
Secondary Method:
• Secondary data are such information which has already
been collected by some agency for a specific purpose
• The same data is primary when collected by the source
agency and becomes secondary when used by any other
agency.
Collection of Secondary Data:
a) Published sources:
• Government Publication:(C.S.O.),(N.S.S.O.)
• International Publications: U.N.O.‘s Statistical Year
book
• Reports of Committees and Commissions:
Kothari commission report on education reforms
• Private Publications: Journals & News Papers,
Research Publication
TABULATION OF DATA
• Tabulation is the process of summarizing classified or
grouped data in the form of a table. A Table is a
systematic arrangement of classified data in columns
and rows.
Type of Tables
• 1. Simple or one-way table.
• 2. Two-way table.
• 3. Manifold table.
Example 1:
• The number of students in AVS College in the year 1990 was 500, of which 200 were rural students. In 1991, the
number of students increased by 150 and the number of urban students increased by 75. In 1992, the number rural
students increased by 20%, while the total number of students increased by 50%. Tabulate the above data.
Frequency Distribution
Class Boundaries
• The smallest and the largest values in a class interval of
exclusive classes of frequency distribution are known as
class boundaries. They are obtained as follows from
inclusive classes:
• Lower class boundary= lower class limit - (d/2)
• Upper class boundary= upper class limit + (d/2)
where d is the difference between the upper class limit
of any class – interval and the lower class limit of the
next class interval
Class mid-point
• Mid value = upper limit + lower limit
2
Frequency Distribution:
Following is the data on ‘Output’ produced by 50 workers. Prepare a
frequency distribution in 7 classes. Also prepare cumulative frequency distribution of
less than type.

110 175 161 157 155 108 164 128 114 128 165 133    

195 151 71 94 87 37 30 62 130 156 167 124    

164 146 116 149 104 141 103 204 162 149 74 113    

69 121 93 143 140 144 187 184 197 87 35 122 203 148
• k=7
•Minimum value = 30
•Maximum value = 204
•So, Range= Max Value – Min Value
= 204 – 30
= 174
C= R/k = 174/7 ~ 25
K is generally between 5 to 15
Classes Tally mark Frequency Relative Percentage
Frequency Frequency

30 – 55        

55 – 80        

80 – 105        

105 – 130        

130 – 155        

155 – 180        

180 – 205        
Less Than Cumulative Distribution
Less than calculation No. of
cumulative series students

Less than 55

Less than 80

Less than 105

Less than 130

Less than 155

Less than 180

Less than 205


Prepare Continuous Frequency Distribution also prepare ‘less
than’ and ‘more than’ cumulative frequency distribution

126 131 113 82 75 204 81 84 118 104

78 90 115 110 98 106 99 107 84 76

119 93 187 139 129 130 68 195 123 125

110 80 107 111 141 136 123 90 186 82

100 109 128 115 107 115 111 92 86 70


• k=8
• Minimum value = 68
• Maximum value = 204
• So, Range= Max Value – Min Value
= 204 - 68
= 136
C= R/k = 136/8 = 17
For easy calculation we will take 20
Classes Tally mark Frequency Relative Percentage
Frequency Frequency
60 – 80        

80 – 100        

100 – 120        

120 – 140        

140 – 160        

160 – 180        

180 – 200        

200 – 220
HW sum: Prepare ‘less than’ and
‘more than’ series.
Marks obtained by No. of students
students
20 – 30 6
30 – 40 18
40 – 50 25
50 – 60 22
60 – 70 17
70 – 80 12
Total 100
GRAPHS:
• A large variety of graphs are used in practice.
Here we will be discussing the graphs of
frequency distributions only.
• A frequency distribution can be presented
graphically in any of the following ways:
1) Histogram
2) Frequency Polygon
3) ‘Ogive' or cumulative frequency curves .
Histogram:
• It is a graph of a frequency distribution in which
the class intervals are plotted on x-axis and their
respective frequencies on y-axis
• On each class a rectangle is drawn , the height of
each rectangle is taken to be equal to the
frequency of the corresponding class .
• The construction of such a Histogram is shown
in the following example.
Example
• Draw a Histogram & Frequency polygon from
the following distribution
• giving marks of 50 students in statistics .
Mar 0-10 10- 20- 30- 40- 50- 60- 70- 80-
ks in 20 30 40 50 60 70 80 90
Stati
stics
No. 0 2 3 7 13 13 9 2 1
of
stud
ents
Frequency Polygon:
• Frequency polygons are more suitable than
histograms whenever two or more frequency
distributions are to be compared.
• The frequencies of the classes are plotted against
the mid-values of the corresponding classes .
• The points so obtained are joined by straight
lines to obtain the frequency polygon.
Draw a frequency polygon and
Histogram: hw
Monthly Income No. of Families
(in Rs.)
0 – 500 10
500 – 1000 15
1000 – 1500 18
1500 – 2000 12
2000 – 2500 8
2500 – 3000 4
Draw a Histogram & frequency Polygone for the
following data:

Mark 17 - 19 19-21 21-23 23-25 25-27 27-29 29-31


s in
Statis
tics
No. of 7 13 24 30 22 15 16
stude
nts
Ogive:
• Ogive is a graphic presentation of the cumulative
frequency (c.f.) distribution of continuous
variable. It consists in plotting the c.f. (along the
y-axis ) against the class boundaries (along x-
axis ) .Since there are two types of c.f.
• ‘less than and equal to’ Ogive
• ‘more than and equal to’ Ogive
Example: less than and more than
cumulative frequency distribution

Marks No. of Less than


Students Cumulative
Frequency
0 – 10 2
10 – 20 3
20 – 30 6
30 – 40 11
40 – 50 12
50 – 60 15
60 – 70 10
70 – 80 7
80 – 90 4
Univariate
Measures of CentralAnalysis:
Tendency
Measures of Variation
Measures of Central Tendency

 There is tendency in almost every statistical data


that most of the values concentrate at the center
which is referred as ‘central tendency’.
 The typical value which measure the central
tendency are called measure of central tendency
commonly known as ‘Average’
Merits & Demerits of Arithmetic Mean:

Merits of Arithmetic Mean .


(1) It is simple to calculate & easy to understand
(2) It is based on each and every observation of the
series
(3) It is capable for further mathematical treatment .
(4) It is least affected by sampling fluctuations.

Demerits of Arithmetic Mean .


(1) It is very much affected by extreme observations.
(2) It can not be used in case of open end classes.
(3) It can not be determined by inspection nor it can be
located graphically.
(4) It is not a good measure of central value for
qualitative data.
Question:
• Find the arithmetic mean for the
following
Mark data 30-
10-20 20- representing
40- 50- marks
60- of
70-
s 30 40 50 60 70 80
60 students.
No.of 8 15 13 10 7 4 3
Stude
nts
Question:
Calculate the arithmetic mean of heights of 80
Students for the following data.

Mark 130- 135- 140- 145- 150- 155- 160-


s 134 139 144 149 154 159 164

No.of 7 11 15 21 16 6 4
Stude
nts
Average of Position:
• Measures of central tendency namely median,
quartiles, deciles, percentiles, and mode are used.

• Median:
• Median may be defined as the middle value in the
data set when elements are arranged in sequential
order of scale.
• Median is a measure of location or centrality of the
observation.
• The median can be calculated for both grouped and
ungrouped data sets.
Median: Question

Xi fi

20 6

9 4

25 16

50 7

40 8

80 2
Question:

• The following table gives the weekly


expenditure of 100 families. Find median
weekly expenditure.
Weekly No. of
expenditur Families(f)
e(Rs)
0 – 10 14

10 – 20 23

20 – 30 27

30 – 40 21

40 – 50 15
Merits & Limits of Median:
 Merits:
 It is useful measure of central value especially in case
of open ended classes.
 It is most suitable measure in case of qualitative data
such as beauty, intelligence, honesty, etc.
 It is not affected by extreme values.
 The value of median can be determined graphically
whereas the value of mean can not be determined
graphically.
 Limitations:
 For calculating median it is necessary to arrange the
data in ascending or descending order of scale.
 Since it is positional average it is not based on all the
observation.
 It is affected by sampling fluctuation.
Mode:
1)For Raw data:
Mode is the value which occurs most frequently ,in
a set of observations. It is a value which is repeated
maximum number of times and is denoted by Z.

2) For ungrouped frequency distribution:


Mode is the value of the variable corresponding to
the highest frequency.

3) For Grouped data:


In a Continuous distribution first the modal class
is determined. The class interval corresponding to
the highest frequency is called modal class.
Merits & Limitation of Mode:
 Merits:
 Mode is not affected by extreme value observation.
 It can be used to describe qualitative data like
consumer’s preference of particular brand of shirt.
 Value of mode can be determined graphically.

 Limitation:
 Value of mode can not always be determined,
whereas in some cases we have multiple mode of
distribution.
 It is not capable of further analysis, i.e. using mode of
two sets of data we can not calculate combined mode.
 It is not based on all observation
Example: 1

 In Rajdhani Rubber Industry, Tilak Nagar, New


Delhi seven laborers are receiving the daily wages
of Rs. 5, 6, 6, 8, 8, and 10.
Example: 2

Daily Wages (Rs) No. of Employees

20 – 40 21

40 – 60 28

60 – 80 35

80 – 100 40

100 – 120 24

120 – 140 18

140 – 160 10
Measures of Dispersion
• The measures of central tendency describe that the
values in the data set tend to spread around central
value called average. But these measures do not
reveal how these values are spread or scattered on
each side of the central value.
• In central tendency can be measured by a number in
the form of an average, the amount of variation
(spread or scattered) among the values in the data set
can also be measured.
Measures of Dispersion:

•Techniques that are used to measure the extent of


variation or deviation of each value in the data set from
a measure of central tendency. Such statistical
techniques are called measures of Dispersion.
Measures of Dispersion
•Absolute Measure: These measures are described by a
number (or value) to represent the amount of variation
among values in a data set. Such a value is expressed in
the same unit of measurement such as a rupee, inch, foot,
kg, etc.
•Relative Measures: These measures are described as the
ratio of a measure of absolute variation to an average and
is termed as Coefficient of Variation. The word
‘coefficient’ means a number that is independent of any
unit of measurement.
Range:
Quartile Deviation:
• The dependence of the range on extreme items can be
avoided by adopting this measure. Quartiles together
with the median are the points that divide the whole
series of observations into approximately four equal
parts so that quartile measures give a rough idea of
the distribution on either side of the average.
Interquartile Range or Deviation
• The IQR measure the spread within middle half of the values in the data
set so as to minimize the value of extreme values in the calculation of
range. Since large number of data set lie in the central part of the
frequency distribution, so it is necessary to study the interquartile range.
• IQR = Q3 – Q1
• Half the distance between Q3 and Q1 is known as Quartile Deviation.
• QD = Q3 – Q1
• 2
• Coefficient of QD = Q3 – Q1
• Q 3 + Q1
Standard Deviation
• In 1893 Karl Pearson first introduced the concept. It is
considered as one of the best measures of dispersion as
it satisfies the requisites of a good measure of
dispersion. The standard deviation measures the
variability of a distribution.
• “Greater the amount of variability , greater is the value
of standard deviation”
• In simple language a small value of standard deviation
means greater uniformity or consistency of the data and
homogeneity of the distribution.
• It is due to this reason that standard deviation is
considered as a good indicator of the representativeness
of the mean.
• It is represented by SD, also represented by the Greek
letter sigma σ
• The Square of standard deviation is called Variance
and is represented by σ2
• Coefficient of variance is a relative measure of
variance it can be used mostly to study share price of
two or more companies to compare the relative
consistency of the prices. It will help a genuine
investor (in shares) in selecting share, the price of
which is relatively more stable. Thus the shares
which are more consistent in the fluctuation of prices
will be preferred by him.
Find SD , Variance and CV
Month Company A Company B Company C
Jan 9.5 15.5 3.3
Feb 13.7 21.2 5.7
March 10.4 23.4 8.9
April 8.6 18.8 2.6
Mean Absolute Deviation:
• This measures takes into account the whole data.
When it is calculated by averaging the deviations
of individual items from their arithmetic mean,
taking all deviation to be positive, the measures is
called mean absolute deviation.
• This measure can be used by company for doing
research work.

You might also like