574 views

Uploaded by kalidasdurge

- Histogram
- statistics for management
- MB0024
- Statistical Methods
- Survey the Customer
- CEO Survey Full Report 2011
- MB0040-Statistics for Management-Answer Keys
- research_methodology
- Statistics for Managers using Microsoft Excel6th Edition Chapter2.pptx
- The Ins and Outs of Histograms with Excel
- MB0040 Statistics for Management Set1
- Business Statistics- A Decision-Making Approach...Subhajyoti
- Assignment-Statistics.doc
- IGNOU MBA Note on Statistics for Management
- Business Statistics_ A First Course.pdf
- Research about Badjaos in Batangas City
- Customer Satisfaction
- Note Taking
- Introduction.docx
- ESS QAF - Completo - Em Inglês

You are on page 1of 224

MB0024-Unit-01-Understand the

usefulness of Statistics

Introduction

consumer behaviour, varied expectations of variety of consumers and new market

openings, modern managers have difficult task of making quick and appropriate

decisions. Therefore there is need for them to depend more upon quantitative techniques

like mathematical models, statistics, operations research and econometrics. These

techniques push back the domain of ignorance and rule of thumb and enlightens them

with new horizon of thought process.

Learning Objective 1

Our day-to-day activities interact with personnel, public, social, political, economic,

business and other environments. Decision making encompasses all these activities.

Suppose we wish to purchase a television we would like to know the price, quality,

durability, maintainability etc. Therefore there is a need for collecting data and making an

optimum decision. Again suppose a company wishes to introduce a new product, it has to

collect data on market potential, consumer likings, availability of raw materials,

feasibility of producing the product etc. In other words data collection is the back-bone of

any decision making process. Many organizations find themselves data-rich but poor in

drawing information from it. Therefore it is important to develop the ability to extract

meaningful information from raw data to make better decisions. Statistics play an

important role in this aspect.

Learning Objective 2

Commerce, Business, Economics, Industry, Insurance, Sociology, Psychology etc.

In Biology, Medicine and Agriculture, Statistical methods are applied in the study of

growth of plant, movement of fish population in the ocean, migration of birds, effect of

newly invented medicines, theories of heredity, estimation of yield of crop, effect of

fertilizer on yield, birth rate, death rate, population growth, growth of bacteria etc. The

insurance premiums are based on the age composition of the population and the mortality

rates. Actuarial science deals with the calculation of insurance premiums and dividends.

Statistics is a part and parcel of Economics, Commerce and Business. Statistical analysis

of variations of price, demand and production are helpful to businessmen and economists.

Cost of living index numbers help in economic planning and fixation of wages. They are

used to estimate the value of money. Analysis of demand, price, production cost,

inventory costs etc., help in decision making in business activities. Management of

limited resources and labour needs statistical methods to maximize profit. Planned

recruitments and distribution of staff, proper quality control methods, careful study of

demand for goods in the market, balance investment, etc. help the producer to extract

maximum profit out of minimum capital. In industries, statistical quality control

techniques help in increasing and controlling the quality of products at a minimum cost.

A government’s administrative system is fully dependent on production statistics, income

statistics, labour statistics, economic indices of cost, price, etc. Economic planning of any

nation is entirely based on statistical facts. Statistics has become so important today that

hardly any science exists independent of this, and hence the statement ‘Science without

Statistics bear no fruit; Statistics without Science has no root’.

Learning Objective 3

analysis and interpretation of numerical data.’ Thus, Statistics contains the tools and

techniques required for the collection, presentation, analysis and interpretation of data.

This definition is precise and comprehensive.

marked extent by multiplicity of causes, numerically expressed, enumerated or estimated

according to a reasonable standard of accuracy, collected in a systematic manner for a

predetermined purpose and placed in relation to each other.

Characteristic of Statistics

Statistics Deals with aggregate of facts: Single figure cannot be analyzed. Thus, the fact

‘Mr Lee is 170 cms. tall’ cannot be statistically analyzed. On the other had, if we know

the heights of 60 students of a class, we can comment upon the average height, variation,

etc.

1. Statistics are affected to a marked extent by multiplicity of causes: The statistics

of yield of paddy is the result of factors such as fertility of soil, amount of rainfall,

quality of seed used, quality and quantity of fertilizer used, etc.

2. Statistics are numerically expressed: Only numerical facts can be statistically

analyzed. Therefore, facts as ‘price decreases with increasing production’ cannot

be called statistics.

3. Statistics are enumerated or estimated according to reasonable standards of

accuracy: The facts should be enumerated (collected from the field) or estimated

(computed) with required degree of accuracy. The degree of accuracy differs from

purpose to purpose. In measuring the length of screws, an accuracy upto a

millimeter may be required, whereas, while measuring the heights of students in a

class, accuracy upto a centimeter is enough.

4. Statistics are collected in a systematic manner: The facts should be collected

according to planned and scientific methods. Otherwise, they are likely to be

wrong and misleading.

5. Statistics are collected for a pre-determined

purpose: There must be a definite purpose for collecting facts. Eg. Movement of

wholesale price of a commodity.

6. Statistics are placed in relation to each other: The facts must be placed in such a

way that a comparative and analytical study becomes possible. Thus, only related

facts which are arranged in logical order can be called statistics.

Learning Objective 4

• It makes comparison easier

• It brings out trends and tendencies in the data

• It brings out hidden relations between variables.

• Decision making process becomes easier.

Learning Objective 5

1. Statistics does not deal with qualitative data. It deals only with quantitative data.

2. Statistics does not deal with individual fact: Statistical methods can be applied

only to aggregate to facts.

3. Statistical inferences (conclusions) are not exact: Statistical inferences are true

only on an average. They are probabilistic statements.

4. Statistics can be misused and misinterpreted: Increasing misuse of Statistics has

led to increasing distrust in statistics.

5. Common men cannot handle Statistics properly: Only statisticians can handle

statistics properly.

With the advent of computers lot of Statistical programmes are available in the market.

They help us in summarizing, presenting and analyzing the mass data in short time. Some

of them are Minitab, SPSS, Texto & Contexto, Excel, E-View etc.

SUMMARY

Decision making process become more efficient with the help of Statistics. It deals with

aggregate of facts. It is applied in all fields of our activities more efficiently. Its

interpretation requires skilled and experienced statistician

MB0024-Unit-02

Introduction

policies according to existing nature of a population, to find the relationship between

characteristics of units in the population etc. require collection and analysis of data in a

systematic manner. In other words a search for knowledge by analyzing numerical data is

known as Statistical Survey or Statistical investigation.

Learning Objective 1

statistical survey is divided into two broad categories.

A. Planning B. Execution

The relevance and accuracy of data obtained in a survey depends upon the care exercised

in planning. A properly planned investigation can lead to best results with least cost and

time. The planning stage consists of the following sequence of activities.

ambiguous manner.

2. Objectives of investigation should be stated at the outset. Objectives could be to

obtain certain estimates or to establish a theory or to verify a existing statement to

find relationship between characteristics etc.

3. The scope of investigation has to be made clear. It refers to area to be covered,

identification of units to be studied, nature of characteristics to be observed,

accuracy of measurements, analytical methods, time, cost and other resources

required.

4. Whether to use data collected from primary or secondary source should be

determined in advance.

5. The organization of investigation is the final step in the process. It encompasses

the determination of number of investigators required, their training, supervision

work needed, funds required etc.

investigation to check the accuracy, coverage, methods of measurements, analysis

and interpretation.

and graphs, analyzed and interpreted.

Learning Objective 2

called units or individuals.

population otherwise it is known as infinite population.

characteristic.

characteristics.

years in a ward of a municipality. The number of houses in the ward is finite.

Therefore the population is finite. The objects are households. The

characteristics measured is number of children below 16 years in a household.

It is measurable and hence quantitative on the other hand if the survey is to

find the number of blind people in a locality. The population is finite, objects

are individuals and characteristics is blindness which is qualitative.

h. In a population some characteristics remain the same for all units and some

others vary from unit to unit. The quantitative characteristic that varies from

unit to unit is called a Variable. The qualitative characteristic that varies from

unit to unit is called an Attribute. A variable that assumes only some specified

values in a given range is known as Discrete Variable. A variable that assumes

all the values in the range is known as Continuous Variable.

Examples:

Learning Objective 3

a. Collection of data is the first and most important stage in any statistical

survey

such as objective, scope, nature of investigation, availability of resources.

c. Data collected for the first time keeping in view the objective of the

survey is known as primary data. They are likely to be more reliable. However

cost of collection of such data are much higher.

methods.

direct contact with units of investigation. The accuracy of data depends upon

the ability, training and attitude of the investigator. This method is suitable

where i) The scope of investigation is narrow, ii) Investigation requires

personal attention of the investigator, iii) Investigation is confidential and iv)

Accuracy of data is important.

Advantages are, we get i) original data ii) more accurate and reliable iii)

Satisfactory information can be extracted by the investigator through indirect

questions iv) Data are homogeneous and comparable v) additional information

can be gathered and vi) Misinterpretation of questions can be avoided.

However it consumes more time and cost.

f. Indirect oral interview is used when area to be covered is large. The data

is collected from a third party or witness or head of institution. This method is

generally used by police department.

Advantages are i) economical in terms of time, cost and man power, ii)

confidential information can be collected, iii) information is likely to be

unbiased and reliable. However the degree of accuracy of information is less.

correspondents are generally adopted by newspaper and T.V. Local agents are

appointed in different parts of the area under investigation. They send the

desired information at regular intervals.

It is used where the area to be covered is very large and periodic information

is required. The information is likely to be affected by the bias of the

correspondents or agencies.

questionnaires are filled by of questions pertaining to the investigation. They

are sent to the respondents with a covering letter soliciting cooperation by

giving correct information and mailing it back. The objectives of investigation

are explained in the covering letter together with assurance for keeping

information provided by them as confidential.

This method is generally adopted by research workers and other official and non-

official agencies. It covers large area of investigation. It is more economical and

free from investigator’s bias. However it results in many “non-response”

situations. The respondent may be illiterate. They can provide wrong information

due to wrong interpretation of questions.

proper drafting of the questionnaire. Following general principle are considered.

ii. Lengthy questions should be avoided.

v. It should be unambiguous.

through personal contact. In order to get reliable information, the investigator

should be well trained, tactful, unbiased and hard working.

contact. The problem of non-response is minimized.

k. The information used for the investigation of the current problem and

obtained from the data collected and used by some other agency or person

before for his investigation is known a secondary data.

are available in research papers, news papers, magazines, government

publication, international publication, websites etc. They are collected for a

different purpose. Therefore care should be exercised while making use of it.

Their accuracy, reliability, objectives and scope should be examined

thoroughly before use.

with respect to each and every individual of the population is observed.

Whereas secondary data may be collected either by census or sampling

methods.

gives a measure of efficiency of the Questionnaire. It reduces the

inconveniences and loss of information. It helps us to introduce necessary

changes.

Before using the data collected it should be checked for its completeness,

accuracy and reliability. By complete we mean that all the required information

should be available.

SUMMARY

A Statistical survey is a search for knowledge. There are two main stages in any

statistical survey, namely, planning and execution. Planning encompasses i)

nature of problem, ii) the objectives, iii) the scope, iv) statistical units, v) degree

of accuracy, vi) period, vii) source of information and viii) organization.

MB0024-Unit-03

Introduction

Collected data in the raw form would be voluminous and non-comprehensible. Therefore

it should be condensed and simplified for better understanding and usefulness.

Classification is first stage in simplification.

characteristics.

Each of the group is called class. For example in survey of Industrial workers of a

particular industry, workers can be classified as unskilled, semi-skilled and skilled each

of which form a class.

Learning Objective 1

Functions of classification

Requisites of good classification are

ii. Exhaustive: every unit should be allotted to one and only one class

viii. Revealing: Should bring out essential features of the collected data.

Types of classification

2. Chronological classification: Data are classified according to the time of its

occurrence.

3. Conditional classification: Data are classified according to certain conditions.

4. Qualitative classification: Classification of data that are non- measurable. E.g.

Sex of a person, marital status, colour etc.

5. Quantitative classification: Classification of data that are measurable either in

discrete or continuous form.

6. Statistical Series: Data arranged logically according to size or time of occurrence

or some other measurable or non-measurable characteristics.

Methods of Classification

way classification.

classification.

iii. Classification done according to more than two attributes or variables is known as

manifold classification.

iv. Examples:

1. One-way classification

No. of students who secured more than 60 % in various sections of same course

3. Manifold classification.

Classification of employees according to skill, sex and education.

Note: G: Graduate

NG: Non-Graduate

Tabulation

columns.

vi. To facilitate further analysis

In spite of the fact that they are closely related, the differences are as follows.

Learning Objective 2

Parts of a Table.

ii. Title: It indicates the scope and the nature of contents in concise form.

vi. Ruling and Spacing: They separate columns and rows. However totals are

separated from main body by thick lines.

vii. Head Note: It is given below the title of the table to indicate the units of

measurement of the data and enclosed in brackets.

viii. Source Note: It indicates the source from which data is taken.

Types of Table

Tables are classified on the basis of

reference to the collected data. They are formed without specific objective, but

can be used for any specific purpose. They contain large mass of data. Example:

Census.

ii. Specific purpose table or text table or summary table deals with specific

problems. They are smaller in size and they highlight relationship between

characteristics. Example: Cost of living indices.

i. Primary Table: They contain data in the form in which it were originally collected

Ref table No.1.

ii. Derived Table: They represents figures like totals, averages, ratios etc. derived

from original data. Ref : table – 2

Table – 1

Departments.

Department Age

Total

s 20 – 40 40 and Above

A A

C C

B B

under under

Graduate Post Graduate Post

Graduate Graduate

graduate graduate

Accounts 10 40 10 10 15 5 90

Finance 10 30 10 12 14 7 83

Personal 15 25 10 10 14 5 79

Production 10 30 10 8 12 6 76

Marketing 5 25 10 0 15 7 62

Total 50 150 50 40 70 30 390

Table – 2

Age

Department

s 20 –

40 & above

40

Accounts 2.564 1.282

Finance 2.564 1.795

Personal 3.846 1.282

Production 2.564 2.051

Marketing 1.282 1.795

12.920 8.205

iii. The cross – classified Table: entries are classified in both directions. Ref table

5

i. Simple Table

Table No.3

1 15

2 20

3 40

4 50

Table No.4

Distribution of Defectives according to Batch and Nature of defective

Batc

Defects

h

Major Minor

I 8 7

II 15 5

III 25 15

Total 40 27

Table No.5

Population of a city according to age, sex and education during 2003 to 2005

Above Below Above

Below 20 yrs 20 – 40 Total 20 – 40 Total

40 20 yrs 40

Male

2003

Female

Male

2004

Female

Male

2005

Female

Learning Objective 3

a. The number of units associated with each value of the variable is called

frequency of that value. Suppose the variable takes the value 15 and the value 15

occurs 3 times then 3 is called the frequency of the value 15.

corresponding frequencies is called a Frequency Distribution of the variable. It is

presented in Tabular form called as Frequency Table. If class intervals are not

present, then it is called a discrete frequency distribution Ref table – 6. A frequency

distribution formed with class-intervals is called a continuous frequency

distribution. Ref table – 7

ranges called class-intervals. Class intervals have lower and upper limits known as

lower class limit and upper class limits. The differences between upper class limit

and lower class limit is termed as class width. The middle value of a class interval is

called mid-value of the class. It is the average of class limits.

Table – 6

0 15

1 20

2 22

3 16

4 7

Total 80

Table – 7

0 – 20 15

20 – 40 20

40 – 60 28

60 – 80 22

80 –

15

100

Total 100

10 is the Lower class interval

The class interval that does not include upper class limit is called Exclusive type of class

interval. The class-interval that includes the upper class limits is called Inclusive – type

of class interval.

Inclusive Type

Marks

0 – 9 15

10 –

20

19

Exclusive Type

The class 0 – 10 does not include the value 10. If the value of 10 occurs, it is included in

the class 10 – 20.

e. From a given frequency distribution we can form five derived frequency distributions.

They are i) Relative frequency distribution, ii) Percentage frequency distribution, iii)

Frequency Density, Distribution, iv) Less than cumulative frequency distribution, v)

More – than cumulative frequency distribution.

If “f” is the class frequency and “N” is the total frequency, the relative frequency

distribution is formed by calculating f/N. Total will always be one.

The percentage frequency distribution is formed by multiplying the ratio f/N by 100.

If “c” is the width of the class-interval and “f” is the frequency of the class, then

frequency density distribution is formed by calculating f/c.

The less than cumulative frequency distribution is formed with number of observations

which are less than a given value.

The more – than cumulative distribution is formed with number of observations which

are more than a given value.

example. The derived frequency distribution is as follows.

Table – 8

Marks Percentage Distribution

distribution D

0 – 20 0.15 0.75 15

20 – 40 0.20 1.00 20

40 – 60 0.28 1.40 28

60 – 80 0.22 1.10 22

80 –

0.15 0.75 15

100

Total 1.00 - 100 %

Table 8 (a)

0 0

20 15

40 35

60 63

80 85

100 100

Table 8 (b)

0 100

20 85

40 65

60 37

80 15

100 0

Frequency distribution of more than two variables is known as multi – variate frequency

distribution. If the number of variables is only two then it is called bivariate frequency

distribution. A bivariate frequency distribution will have two Marginal Distributions and

“m+n” conditional distribution.

Table 9

9,000 – 12,000 12,000 – 15,000 15,000 – 18,000 Total

20 – 30 10 3 - 13

30 – 40 8 12 3 23

40 – 50 6 15 10 31

50 – 60 - 3 18 21

Total 24 33 31 88

Numbers in last row & column represents Marginal distribution of Age. Any row or

column number represents conditional distribution of salary.

Represents conditional distribution of Age for given salary.

ii. No. of class intervals is given by the Sturge’s Rule viz. K = 1+3.2 log N. where N is

the total number of observations.

Note: In Practice divide the range either by 2 or 5 or 10 or multiples of 10 such that the

number of class intervals will be between 7 and 15. Avoid open-end class interval. Make

sure that class-intervals do not overlap. Tally marks are used to construct frequency

Table. Tally Mark is a small vertical line drawn against a class as soon as we observe a

value belonging to the class. The fifth tally mark is crossed for easy counting purposes.

If the class interval that does not prescribe lower limit for 1st class or upper limit for last

class is known as open-end class interval.

Learning Objective 4

a. Top Management and common man do not have time to go through mass data and

understand its nature. For them diagrammatic and graphical presentations are more

intelligible, attractive and appealing. They give a bird’s eye-view of the data. They

facilitate comparison of various aspects of data. They create ever lasting impressions.

However they can not be considered as alternatives for numerical data. Mathematical

calculations are not possible. They do not give accurate values.

dimensional we have Bar Diagrams. In two dimensions we have pie diagram. Different

Bar diagrams are simple bar diagram, component Bar diagram, sub-divided Bar diagram,

Percentage Bar diagram etc.

Table 10

showing students’ composition.

Note: It is easier to draw the diagram if we first find the cumulative total for each section.

They are drawn when we have two or more sets of comparable values.

Example:

Simple Bar Diagram: It is drawn when items are to be compared with respect to a single

characteristic. A rectangular bar is constructed with height proportional to the magnitude

of the items.

Example:

Represent the following data regarding the yield / acre of paddy in Karnataka over the

last five years.

Yield 20 22 25 27 30

Component (sub-divided) Bar Diagram: They are used when two or more

characteristics are observed on a unit. Each Bar is proportionally subdivided.

Example:

Product A

2002 –

40 70

2003

2003 –

45 85

2004

2004 –

55 90

2005

a. Component Pie Diagram: It is drawn when data have magnitudes for two or more

components. Circles with area proportional to magnitudes are drawn to represent the total

magnitude. Then circles are divided sector-wise according to the magnitude of the

components.

If T is the total magnitude and R is the magnitude of a component, then the angle at the

centre is given by.

Example:

Draw pie – diagram for the following data regarding expenses of two families.

Monthly Expenses of

Items

Family A Family B

Food 2000 4000

Rent 1000 1500

Fuel 500 1000

Misc 500 1500

Total 4000 8000

We draw two circles with radii 1.3 cms and 1.8. Where 1 cm = 50 units. The angle at the

center are determined as follows.

Family A Family B

Food 1800 1800

Rent 900 67.50

Fuel 450 450

Misc 450 67.50

Total 3600 3600

Graphical Presentation

i. Histogram

area proportional to class frequency.

If the class intervals have equal width then the variable is taken along X-axis and

frequency along Y-axis and a rectangle is constructed.

Example: For the following distribution of Age the histogram is drawn as follows.

We join the upper left corner of highest rectangle to the right adjacent rectangle’s left

corner and right upper corner of highest rectangle to left adjacent rectangle’s right corner.

From the intersecting point of these lines we draw a perpendicular to the X-axis. The X-

reading at that point gives the mode of the distribution.

If the widths of the rectangles are not equal then we make areas of rectangles

proportional and draw the histogram as follows

Frequency Polygon: The mid values of class-intervals are plotted against frequency of

the class interval. These points are joined by straight lines.

Frequency Curve: First we draw histogram

for the given data. Then join the mid points of the rectangles by a smooth curve. Total

area under frequency curve represents total frequency. They are the most useful form of

frequency distribution.

Ogives

1. Less than-ogive: Variables are taken along X-axis and less than cumulative

frequencies are taken along Y-axis. Less than cumulative frequencies are plotted

against upper limit of class interval and joined by a smooth-curve.

2. More than Ogive: More than cumulative frequencies are plotted against lower

limit of the class-interval and joined by a smooth-curve.

From the meeting point of these two ogives if we draw a perpendicular to X-axis,

the point where it meets X-axis gives Median of the distribution.

Example: Draw ogive for the following data and hence determine Median

0 – 10 5 10 5 0 50

10 – 20 10 20 15 10 45

20 – 30 20 30 35 20 35

30 – 40 12 40 47 30 15

40 – 50 3 50 50 40 3

Total 50 50 0

Note: With help of ogive we can find all positional values of a distribution. It

gives at a glance percentage of readings that will lie below or above a specified

value.

SUMMARY

systematic manner according to common characteristics. Classification simplifies

and makes data more comprehensible and renders the data ready for statistical

analysis.

Classified data is tabulated in rows and columns for presentation, using various

types of classification. Frequency distribution is a special type of tabulation. In

more concise form it brings out the salient features of the distribution.

Data is presented in Diagram or graph form is more appealing and gives rough

idea of the situation for busy executives.

MB0024-Unit-04

Introduction

Mass data, which are collected, classified, tabulated and presented systematically, is

analyzed further to bring its size to a single representative figure.

The tendency of data to cluster around a figure is known as central tendency. Measures of

central tendency or averages of first order describes the concentration of large numbers

around a value. It is a single value which represents all units.

Learning Objective 1

Understand the concepts of Central tendency

Statistical Averages

values and is represented by .

Example: Arithmetic mean of 15, 17, 22, 21, 19, 26, 20 is given by

i. For discrete data with frequency it is given by

Example:

Example

No. of students: 50 65 80 55

Solution:

=

i. Algebraic sum of deviations of a set of values taken from their Mean is always Zero

i,e.

ii. Sum of squares of deviations of a set of values from their mean is always minimum.

iii. It is capable of further algebraic treatment. Suppose if X1, X2….. Xn are the means

of n1, n2…….nn sets of values, then their combined arithmetic mean value is given by

Example: If average height of 30 men is 158 cm and average height of another group of

40 men is 162 cm. Find average height of combined group

Given n1 = 30 X1 = 158

n2 = 40 X2 = 162

Note: In the above example given any 4 values, we can find the fifth value.

n2 = 40 = 162

= 160.28

Then,

Or,

or,

∴

= 4739.6 = 157.98

Example: In an office there are 84 employees. The distribution of their salaries are

follows

Employee

4 28 31 16 3 2

s

Solution

fx

(X) (f)

2430 4 9720

2590 28 72520

2870 31 88970

3390 16 54240

4720 3 14160

5160 2 10320

Total 84 249930

i. Mean = = 2975.36

Example: The following data is related to the marks scored by students of a class in an

examination. Calculate the mean.

Percentag

e Less than Less than Less than Less than Less than Less than Less than

10 20 30 40 50 60 70

marks

No. of

4 16 20 65 85 97 100

students

distribution as follows.

Example 8: Average weight of 100 screws in box “A” is 10.4 gms. It is mixed with 150

screws of box “B”. Average weight of mixed screws is 10.9 gms. Find the average weight

of screws of box “B”

Solution:

=? n2=150

We know

Solving we get,

Example: Find the missing frequency for the following distribution given the mean value

as 129.

Class

80 – 100 100 – 120 120 – 140 140 – 160 160 – 180 Total

Interval

Frequency 8 - 26 14 10 80

Mid X f fx

90 8 720

110 f 110f

130 26 3380

150 14 2100

170 10 1700

58+f 7900+110f

Since

Missing frequency is 22

v. It is more stable.

2. It can not be determined for distributions with open-end class intervals.

3. It can not be graphically located.

4. Sometimes its a value which is not in the series.

Learning Objective 2

Median

Median of a set value is the value is the middle most value when they are arranged in the

ascending order of magnitude and is denoted by M.

In case of Discrete series without or with frequency it is given by M= value

Note: To solve problems on Median, arrange Data in ascending order or descending order

(2) Make class-interval as exclusive type.

Set values 45, 32, 31, 46, 40, 28, 27, 37, 36, 41, 47, 50

27, 28, 31, 32, 36, 37, 40, 41, 45, 46, 47, 50

f: 4, 9, 3, 5, 4, 2, 10

Solution:

n = 37

M = 15

In case of continuous series, it is given by

fc = frequency of class

Frequency: 10 15 40 27 8

∴ M = 43.125

144.5 – 149. 15 15

149.5 – 154.5 22 37

154.5 – 159.5 38 75

159.5 – 164.5 17 92

164.5 – 169.5 8 100

Example: Find the missing frequency for the following data given that its median is 34

Solution: Since Median is 34, it falls in the class-interval 30 – 40. Let “f” be the missing

frequency. Therefore we have

0 – 10 4 4

10 – 20 9 13

20 – 30 - 13 + f

30 – 40 20 33 + f

40 – 50 18 51 + f

50 – 60 7 58 + f

60 – 70 3 61 + f

Or,

Or, f = 35 – 16 = 19

∴ Missing frequency is 19

Merits of median

2. It is not affected by extreme values.

3. It can be determined Graphically (ogives).

4. It can be used for qualitative data.

5. It can be calculated for distributions with open-end classes.

Demerits of Median

2. It is not capable of further algebraic treatment.

3. It is not based on all values.

Mode:

Mode is the value which has the highest frequency and is denoted by Z.

Modal value is most useful for business people. For example shoe and Ready made

Garment manufacturers will like to know the modal size of the people to plan their

operations.

For discrete data with or without frequency it is that value corresponding to highest

frequency

Example: The following data relate to size of shoes. Find the mode.

6, 7, 6, 8, 9, 9, 9, 10, 8, 7, 7, 9, 10, 9, 9, 9, 8, 8, 11

Where

Example: Praveen apartment builders found the number of customers who wishes to

have plinth area of their apartments as follows:

Find the modal plinth area

Solution: we note that the intervals are exclusive type. Highest frequency is 25.

Therefore corresponding interval is 1200 – 1400, which is called Modal class.

Therefore,

Merits of Mode

2. It is not affected by extreme values.

3. It can be calculated for distributions with open end classes.

4. It can be located graphically.

5. It can be used for qualitative data.

Demerits of Mode

7. It is not capable of further mathematical treatment.

8. It is much affected by sampling fluctuations.

Geometric Mean

The geometric mean of a series of “n” positive numbers is given by

GM =

GM =

Where n = f1 + f2 + ………….. + fn

GM =

Where n = f1 + f2 + …………. + fn and x1, x2 are the mid points of class intervals.

Example: The growth in bad-debt expense for Das office supply company over the last

few years is as follows. Calculate the average percentage increase in bad-debt expense

over this time period.

Solution:

G.M =

= 1.09675

Whenever data deal with rates, ratios, growth rate, etc Geometric mean is the best

measure

Note: Geometric mean is not defined even if one of the value is zero or negative.

Harmonic Mean

If x1, x2, …………xn are “n” values for discrete series without frequency then their

Harmonic Mean

X f/λ

9.7 0.10.31

9.8 0.10.20

9.5 0.1053

9.4 0.1064

9.7 0.1031

Total 0.5199

∴ HM = 5 /(0.5199) = 9.6172

f 5 25 36 37 20

Solution:

X f f/X

121 5 0.04132

122 25 0.20492

123 36 0.29268

124 37 0.29839

125 20 0.16000

Total 123 0.99731

Example : In a locality the distribution of average speed of birds in the evening were

observed to be as follows. Find average speed of birds using harmonic mean.

Class –

80 – 82 82 – 84 84 – 86 86 – 88

Interval

Frequency 5 7 3 2

Solution:

Mid x f f/x

81 5 0.06173

83 7 0.08434

85 3 0.03520

87 2 0.02299

Total 17 0.20435

→ Median is the midvalue of series of data. It divides the distribution into two equal

portions. Similarly we can divide a given distribution into four, ten or hundred or any

other number of equal portions.

Quartiles: When distribution is divided into four equal portions, then we get first quartile

(Q1), second Quartile (Q2 = Median) and third quartile (Q3) as the positional averages.

For discrete series with or without frequency Q1 is given by (N+1 / 4)th value and Q3 is

given by

3 (N + 1)/4th value

Example : Weekly sales of a product on 8 different shops are as follows. Calculate the

Quartiles.

Sales in units: 309, 312, 305, 307, 310, 308, 308, 306, 308

Solution:

Q1 = 306.25

Weighted Averages: Suppose the values x1, x2, ……. xn are assigned the weights w1,

w2………wn then their weighted average is given by

and their weighted Geometric Mean is given by

Learning Objective 3

Dispersion:

distribution of weights of a product produced by two machines.

Machine A B

Sample size 1000 1000

Average wt 80 80

Minimum wt 20 40

Maximum wt 140 100

Machine B produces products with weights much closer to the average than Machine A.

As a manufacturer or customer we would choose Machine B. In other words we choose

that machine whose spread is smaller.

The property of deviations of values from the average is called Dispersion or Variations.

The degree of variations is found by the measures of variations.

1. Range (R)

2. Quartile Deviations ( Q.D)

3. Mean Deviations (M.D)

4. Standard Deviations (S.D)

They have units of measurement attached to them. Therefore they are known as absolute

measures of variations. However we may want to compare two different distributions

whose measurements are one in terms of Kg and another in terms of cm. Then we use the

following relative measures that do not have any units attached to them. They are

1. Coefficient of Range

2. Coefficient of Quartile Deviations

3. Coefficient of Mean Deviations

4. Coefficient of Variations.

They are known as Relative measures.

2. It should be based on all values.

3. It should be rigidly defined.

4. It should not be affected by extreme values.

5. It should not be affected by sampling fluctuations.

6. It should be capable of further algebraic treatment.

Range

Range is the difference between highest and lowest value of the data.

L: Lowest value

Merits

Demerits

Use:

2. When the study does not require deep analysis

3. When data has no abnormal values.

Example 30: Find the Range of the following discrete series 26, 28, 28, 26, 28, 30, 27,

29, 26, 24

Solution: R = 30-24 = 6

Frequency 10 15 25 12 8

Solution: R = 25-0 = 25

Note: If the class interval’s are open then Range is not defined.

Quartile Deviations

2. Q3-Q1 gives the middle 50% of reading. Q3 and Q1 are also known as upper and lower

limit of middle 50% of readings.

Example: Find the inter quartile Range, Q.D and coefficient of Q.D for the demand

distribution of toothpaste packs for various price categories.

Frequency 15 25 38 14 8

4.5-9.5 15 15

9.5-14.5 25 40 Q1 class

14.5-19.5 38 78

19.5-24.5 14 92

24.5-29.5 8 100

Total 100

Q1=9.5+{(25-15)X5}/25=10.5 Rs

Q3 = 75th value

Q3 =14.5+{(75-40)X5}/38=19.11 Rs

Q.D.=(19.11-10.5)/2=8.61/2=4.305 Rs

Coefficient of Q.D.=(19.11-10.5)/(19.11+10.5)=(8.61)/(29.61)

Merits:

2. It is rigidly defined

3. It is not affected by extreme values.

Demerits:

5. It is affected by sampling fluctuations.

6. It is not capable of further algebraic treatment.

Mean Deviation:

It is defined as the mean of absolute deviations of the values from central value.

The Mean deviation from Mean for discrete series without frequency is given by

M.D ( ) = Σ f (X – X)/Σf

In case of continuous series “X” represents Mid value of class-interval.

Similarly we can have Mean Deviation from Median or Mode. X is replaced by Median

or Mode in the above formulae.

However Mean Deviation from Median is the least. It is known as Minimal property of

Mean Deviation.

Coefficient of M.D.( )=

Coefficient of M.D.(Median)=M.D.(Median)/Median

Example: Calculate mean deviation and also coefficient of Mean deviation using i) Mean

and ii) Median. Compare the results.

Solution:

X

x – 145 x – 143.5

140 5 3.5

141 4 2.5

142 3 1.5

143 2 0.5

144 1 0.5

145 0 1.5

147 2 3.5

158 13 6.5

116

30 20.0

0

∴ Mean = 1160 /8 = 145

Merits

2. It is less affected by extreme values.

3. It is not affected much by sampling fluctuations.

Demerits

2. It does not take into account negative signs.

Uses of M.D

economic, business and social phenomena.

Standard deviation

Measures of dispersion Range and Q.D are not based on all values. Mean deviation based

on all values does not take into consideration the sign. Therefore a measure that removes

both drawbacks is given by standard Deviation (S.D).

The standard deviation of a set of values is the positive square root of mean of the

squared deviations of the values from their arithmetic mean. It is denoted by σ (sigma).

σ = √Variance

σ = √Variance

For (A)

Variance =

σ = √Variance

For (B)

σ = √Variance

by σ 2

Example: Calculate the S.D for Variation in temperature observed during two months at

Bangalore.

σ

Temp0c 18 19 20 21 22 23 24 25 Total

Frequency 3 5 8 16 12 8 5 3 60

Solution:

X f d = x-21 fd fd2

1

3 -3 -9 27

8

1

5 -2 -10 20

9

2

8 -1 -8 8

0

2

16 0 0 0

1

2

12 1 12 12

2

2

8 2 16 32

3

2

5 3 15 45

4

2

3 4 12 48

5

60 28 192

Example 37: The diastolic blood pressure of men is distributed as fallows. Find the

standard deviation and variance.

Pressure(men) 78-80 80-82 82-84 84-86 86-88 88-90

No of Men 3 15 26 23 9 4

Solution:

d = x-83

Class Interval Mid value X Frequency ‘f’ fd fd2

2

78-80 79 3 -2 -6 12

80-82 81 15 -1 -15 15

82-84 83 26 0 0 0

84-86 85 23 1 23 23

86-88 87 9 2 18 36

88-90 89 4 3 12 36

80 32 122

2. Standard deviation is always greater than or equal to zero.

3. It is the least of all root – mean – square deviations.

σ 1 and σ 2 then the combine standard deviation of both the values is given by

Where, and

Example: The average weight of 100 apples from area “A” is 150gms with standard

deviation of 10gms. Similarly the average weight of 200 apples from area “B” is 200gms

with standard deviation of 15gms. Find the combine standard deviation.

Solution:

= 150 = 200

σ 1 = 10 σ 2 = 15

Therefore,

Merits

1. It is rigidly defined.

2. It is based on all values.

3. It is capable of further algebraic treatment.

4. It is not very much affected by sampling fluctuations.

Demerits

1. It is difficult to understand.

2. It gives undue weight age for extreme values.

3. It cannot be calculated for classes with open end interval.

pertaining to different characteristic or pertaining to same characteristics, then we

use coefficient of variation. It is a relative measure expressed in percentage and is

defined as

2.

two or more sets of data.

Example: Find Standard deviations of the following two series and state which is

more stable.

Series A: 192, 288, 236, 229, 184, 160, 384, 291, 330, 243

Series B: 31, 48, 13, 51, 38, 43, 50, 36, 47, 82

Solution:

Series A

d = x-260 d2

X

192 -68 4624

288 28 784

236 -24 576

229 -31 961

184 -76 5776

160 0 0

384 124 15376

291 31 961

330 70 4900

243 -17 289

+37 34247

Therefore,

And

Series B

X2

X

31 961

48 2304

13 169

51 2601

38 1444

43 1849

50 2500

36 1296

47 2209

82 6724

Total 439 22057

Since c.v for series A (22.15) is less than c.v for series B (38.02), series A is more

stable.

Example: The bursting and tensile strength of a type of paper showed the

following results

Mean 40 130

S.D 6 15

Which characteristic is more variable?

SUMMARY

in terms of its two important features – i) with respect to nature of data to cluster

around a central value and ii) with respect their spread from their central value.

MB0024-Unit-05

Introduction

Every human activity has an element of uncertainty. Uncertainty affects the decision

making process. We use the word “Probably” every often, like, probably it may rain

today, probably the share price may go up in the next week. Therefore there is a need to

handle uncertainty systematically and scientifically. Probability theory helps us to make

wiser decisions.

Learning Objective 1

Definition

“A”. It is denoted by P(A). It is the ratio between the favourable outcomes to an event

“A” (m) to the total outcomes of the experiment (n). In other words

a. Experiment:

Tossing a coin is an experiment if it shows Head or tail on falling. If it stands on its edge,

then it is not an experiment.

b. Random Experiment:

experiment or stochastic experiment

c. Sample space or total number of outcomes of an experiment is the set of all possible

outcomes of a random experiment and is denoted by S.

In tossing two coins S = {HH, HT, TH, TT}. The number of out comes is denoted by n(s)

= 4.

If the number of outcomes is finite then it is called Finite Sample Space otherwise it is

called Infinite Sample Space.

d. Event:

P (A) = ½

therefore P(A) = 2/4 = 1/2 . It is a subset of sample space.

Two or more events are said to be equally likely if they have equal chance of occurrence.

In tossing an unbiased coin getting head and tail are equally likely.

Two or more events are said to be mutually exclusive if the occurrence of one prevents

the occurrence of other events.

In tossing a coin if head falls, it prevents the occurrence of tail and vice versa.

A set of events is exhaustive if one or other of the events in the set occurs whenever the

experiment is conducted. It can be defined also as the set whose totality of sample points

form the total sample points of the experiment.

i. Independent Events Two events said to be independent of each other if the occurrence

of one is not affected by the occurrence of other or does not affect the occurrence of the

other.

Illustration

Event A, B, C and D are mutually exclusive and exhaustive but not equally likely.

Approaches to probability:

There are four approaches to probability They are i) Classical / Mathematical / Priori

approach ii) Statistical / Relative frequency / Empirical / posteriori approach iii)

Subjective approach and iv) Axiomatic approach.

Under this approach the probability of an event is known before conducting the

experiment

Examples:

b) Drawing a king from well shuffled pack

outcomes, n – total number of outcomes of the experiments.

However it is not possible to give probability to all events of our life. We cannot attach a

definite probability to the event “that it will rain today”.

experiment. If we want to know the probability that a particular household in an area will

have two earning members, then we have to gather data on all household in that area and

arrive at the probability. The greater number of households surveyed, the more accurate

will be the probability arrived.

In real life it is not possible to conduct experiments because of high cost or of destructive

type experiments or of vast area to be covered.

Under this approach the investigator or Researcher assigns probability to the events either

from his experience or from past records. It is more suitable when the sample size is ten

or less than ten. The investigator has full knowledge about the characteristics of each and

every individual. However there is a chance of personal bias being introduced in such

probability.

such that

a. 0 ≤ P (Ai) ≤ 1

b. ∑ P (Ai) = 1 for i = 1 to n

Rules of Probability

a. Addition Rule

i. If A and B are any two events then the probability of the occurrence of either A or B is

given by

P (A U B) = P (A) + P (B) – P (A ∩ B)

events then the probability of occurrence of either A or B is given by

P (A U B) = P (A) + P (B)

iii. If A, B and C are any three events then the probability of occurrence of either A or B

or C is given by

C)

iv. If A1, A2, A3………An are “n” mutually exclusive and exhaustive events then the

probability of occurrence of at least one of them is given by

Managers are very often come across with situations where they have to take decision

about implementing either course of action A or course of action B or the course of action

C. Sometimes they have to take decisions regarding the implementation of both A and B.

For example a sales manger may like to know the probability that he will exceed the

target for product A or product B. Sometimes he would like to know the probability that

sales of product A and B will exceed the target. The first type of probability is answered

by addition rule. The second type of probability is answered by multiplication rule.

b. Multiplication Rule

i. If A and B are two independent events then the probability of occurrence of A and B is

given by

Conditional Probability:

Sometimes we wish to know the probability that the price of a particular petroleum

product will rise, given that the finance minister has risen the petrol price. Such

probabilities are known as conditional probability.

Thus the conditional probability of occurrence of an event “A” given that the event “B”

has already occurred is denoted by P (A / B). Here A and B are dependent events.

Therefore we have the following rules.

If A and B are dependent events then the probability of occurrence of A and B is given by

For any Bivariate distribution there exists two marginal distributions and “m + n”

conditional distributions, where m and n are the number of classifications / characteristics

studied on two variables. Consider the following example.

A librarian analyzed the type of visitors and their choice of library section as follows:

Sections

education

News Paper Magazine Novel (story) Subject Total

Under Graduates 50 100 120 50 320

Graduates 70 90 50 100 310

Post Graduates 100 60 30 150 340

Total 220 250 200 300 970

i.

Undergraduates 320

Graduates 310

Post graduates 340

Total 970

Therefore it is called Marginal Distribution

ii.

News

Magazine Novels Subjects Total

paper

220 250 200 300 970

levels. It is another Marginal Distribution. Thus there are two marginal distributions for

Bivariate data, variables being sections and level of eduction.

iii.

Level of

News paper Magazine Novels Subjects Total

education

Under graduate 50 100 120 50 320

This represents the distribution of people in sections given that they are under graduate.

Therefore it is a conditional distribution. Thus for any Bivariate distributions having m

and n classifications there exits two marginal distributions and m + n conditional

distributions. In this case there are 3 + 4 = 7 conditional distributions.

iv. If the words “either, or” is used check whether the events are mutually exclusive or

not, to apply addition rule.

v. If the words “both or and” used, check whether the events are independent or

dependent, to apply proper multiplication rules.

vi. To find the total outcome of the experiment use 2n or 6n in the case of coin or die

respectively, where “n” is the number of coins or dice thrown at a time or a coin or die

thrown “n” times. In all other cases use nCr.

Example 1:

1. 0! = 1

Solution:

S = {H, T}

∴ n(S) = 2

n(A) = 1

Therefore,

Example 4: What is probability of getting two heads when 3 coins are tossed and what is

the probability of getting at least one head?

⇒ n(S) = 8

Note: n(S) = 23 = 8

⇒ n(A) = 3

Therefore,

⇒ n (A) = 4

Therefore,

Example 5: What is the probability of getting a sum “Nine” when two dice are thrown?

n(S) = 62 = 36

∴ n(A) = 4

Therefore,

Example 6: A number is selected at random from the numbers 1 to 30. What is the

probability that

Solution: i) Let “A” be the event of selecting a number divisible by 3

n(S) = 30C1 = 30

n(A) =10

n(B) = 4

n(S) = 30 C1 = 30

committee to monitor quality of their products. The company has 5 Scientists, 4

Engineers and 6 Accountants. Find the probability that the committee will contain 2

Scientists, 1 Engineer and 2 Accounts.

Solution: Let “A” be the event of selecting 2 Scientists, 1 Engineer and 2 Accountants.

N(S)=15C5=

Therefore, P(A)=600/3003

Example 9: The odds favouring the event of a person hitting a target are 3 to 5. The odds

against the event of another person hitting the target are 3 to 2. If each of them fire once

at the target, find the probability that i) both of them hit it ii) at least one of them hit ‘d’.

Solution: i) Let “A” be event of first person hitting a target. Odds in favour means

Therefore,

Therefore

Therefore,

Example 10: The probabilities that drivers A, B and C will drive home safely after

consuming liquor are 2/5, 3/7 and 3/4, respectively. What is the probability that they will

drive home safely after consuming liquor?

Given

(2/5)X(3/7)X(3/4)=9/70

Example 11: The probabilities that “A” and “B” will tell truth are 2/3 and 4/5

respectively. What is the probability that i) they agree with each other ii) they contradict

each other while giving a witness in the court.

i. Both will agree if they say truth or they together lie i,e A ∩ B or Ac

∩ Bc. They are mutually exclusive.

∴ P(A ∩ B) + P (Ac

∩ Bc)

(1/5)=9/15=3/5

ii) They will contradict if A tells truth and B tells lies or B tells truth and A tells lie

Example 12: A bag contains 5 red and 4 blue similar balls. Two balls are drawn at

random from the bag. Find the probability that both of them are red if i) the balls are

drawn together ii) the balls are drawn one after the other, with replacement iii) the balls

are drawn one after the other, without replacement.

n(S)=9C2=(9X1)/(1/2)=36

n(A)=5C2=(9X8)/(1X2)=36

Therefore, P(G)=10/36=5/18

ii) Let “A” be the event of drawing a red ball in the first draw

Let “B” be the event of drawing a red ball in the second draw

iii) Let “A” be the event of drawing red ball in the first draw

Let “B” be the event of drawing red ball in the second draw

Since the first ball is not replaced the sample space changes for second draw

Example 13: Box I contains 5 Red and 6 Blue balls. Box II contains 6 Red and 4 Blue

balls. A ball is drawn at random from box I and is transferred to box II. Now from Box II

a ball is drawn at random. What is the probability that it is red?

Solution: A ball drawn from Box I and transferred to Box II could be either red or blue.

Let “A” be the event of drawing a red ball from Box I.

Let “C” be the event of drawing red ball from Box II.

Example 14: The probabilities that component A and component B of a machine will fail

are 0.09 and 0.06 respectively. The machine will fail if any one of them fails. Find the

probability that it will fail?

P(B) = 0.06

= 0.1446

It has 52 Mondays. For one more Monday we select from the following combination of

the remaining 2 days.

∴ n(S) = 7 and n(A) = 2

Baye’s Probability

Let A1, A2, A3, A4 be mutually exclusive and exhaustive events of a random

experiment. Let “B” be a common event. In Venn Diagram it is presented as

follows.

= [From (3)]

In general Baye’s Theorem states that if A1, A2…………..An are “n” mutually exclusive

and exhaustive events and B is a common event to all theorems then probability of

occurrence of A1 given that “B” has already occurred is given by

P(A1/B)=

conditional probability and Baye’s probability is as follows:-

1. Finds the probability of

population Finds the probability of getting a sample value given the

population value.

value, given the sample value

2. It is possible to incorporate

latest

It is not possible to do so.

information

3. It is possible to incorporate

cost

It is not possible in this case

aspects

Whenever there are two probabilities connected with an event then we have to apply

Baye’s approach to solve it.

Example 16: The probabilities that Mr.Aravind, Mr.Anand and Mr.Akil will become

vice-president of a company are 0.40, 0.35 and 0.25 respectively. The probabilities that

they will introduce new product are 0.10, 0.15 and 0.20 respectively. What is the

probability that Mr.Anand introduced a new product by becoming vice-president?

Solution:

We are given that P(A1) = 0.4, P(A2) = 0.35, P(A3) = 0.25, P(B/A1) = 0.10, P(B/A2) =

0.15, P(B/A3) = 0.20.

Even

Joint Prob

t Prior Probability Conditional prob

Posterior Probability

P(Ai) P(B/Ai)

P(Ai n B)

Ai

A1 0.4 0.10 0.0400 0.0400/0.1425=0.2807

A2 0.35 0.15 0.0525 0.0525/0.1425=0.3684

A3 0.5 0.20 0.000 0.0500/0.1425=0.3509

Total 1.00 P(B) 0.1425 1.0000

Example 17: A factory has three Machines M1, M2 and M3. They produce 4000, 10,000

and 6,000 products per day. From past records it is known that M1, M2, and M3 produce

5%, 4%, and 8% defectives. A product is selected at random from the day’s production.

What is the probability that it was not produced by Machine M3.

Solution:

Let “B” be the event that the product is defective. Then we are given

P(A1)=4000/20000=0.2

P(A2)=10000/2000=0.5

P(A3)=6000/20000=0.3

P(B/A1) = 0.05 P(B/A2) = 0.04 P(B/A3) = 0.08

Even

Joint Prob

t Prior Probability Conditional prob

Posterior Probability

P(Ai) P(B/Ai)

P(Ai n B)

Ai

A1 0.2 0.05 0.010 0.010/0.054=0.1852

A2 0.5 0.04 0.020 0.020/0.054=0.3704

A3 0.3 0.08 0.024 0.024/0.054=0.4444

1.00 P(B) 0.054 1.0000

= 1 – 0.4444 = 0.5556

Learning Objective 2

Random Variable

sample space, such that

i. P (Xi) = P [X = Xi] V i

ii. P (Xi) ≥ 0 V i

If Xi is a continuous random variable then P(X) is called probability density function and

is denoted by f(X).

1. For example let us consider the tossing of three coin. The resulting events are:

No. of Heads

P(Xi)

(Xi)

3 ⅛

2 ⅜

1 ⅜

0 ⅛

Total 1

For every Xi we are able to assign a P(Xi) such that ∑ P(Xi) = 1. No. of heads probability

form a probability distribution.

A systematic presentation of random variable with its value and probabilities is called a

probability distribution of that random variable. The distribution will have its mean and

standard deviation.

= ∑ Xi P(Xi)

Examples

Example 18:

A Random variable takes the values -3, -2, 1, 0, 4, 6 with probabilities 1/12, 2/12, 3/12,

4/12, 1/12, 1/12 respectively find its mean or expected value and variance.

Solution: Given

-3 1/12 -3/12 9/12

-2 2/12 -4/12 8/12

1 3/12 3/12 3/12

0 4/12 0 0

4 1/12 4/12 16/12

6 1/12 6/12 36/12

Total 6/12 72/12 = 6

= 6 – ¼ = 23/4

Example 19: Mr. A and B play a game. If “A” picks up an even number from 1 to 6, B

will pay him double the amount equal to picked up number. If “A” picks up an odd

number then he has to pay amount equal to double the picked up number. What is A’s

expectation?

Solution: Let Xi be the random variable and P(Xi) be its probability, then we have

1 -2 1/6 -2/6

2 4 1/6 4/6

3 -6 1/6 1/6

4 8 1/6 8/6

5 -10 1/6 -10/6

6 12 1/6 12/6

Total 1 11/6

Example 20: If Xi is a random variable with the following distribution find i) P(Xi) ≥ 3

ii) P(Xi = 0) iii) P(1 ≤ Xi

≤ 3) iv) P(Xi) ≥ 4

Xi -3 -2 0 1 2 3 4 5

P(Xi) K 2K 2K 3K 3K 2K K K

Solution: Since Xi is a random variable ∑ P(Xi) = 1

i) P(Xi

≥ 3) = P(Xi = 3) + P(Xi =4) + (P(Xi) = 5)

= 2K + K + K = 4K = 4/15

iii) P(1 ≤ Xi

≤ 3)

= 3K + 3K + 2K = 8K = 8/15

iv) P(Xi

≥ 4) = P(Xi = 4) + P(Xi = 5)

= K + K = 2K = 2/15

SUMMARY

Probability plays an important role in decision making process. The basic definitions and

approaches were explained with examples. The environments where to use the different

rules are also explained with examples.

MB0024-Unit-06

Introduction

Individuals and corporates generate several data that resembles certain theoretical

distributions. Since mathematically we have many derived characteristics of the

theoretical distributions, we can make use of them for a quick analysis of the observed

distributions. Examples of observed distributions are:-

i. Number of male children in a family.

Learning Objective 1

Bernoulli Distributions:

A variable which assumes values 1 and 0 with probabilities p and q=1-p, is called

Bernoulli variable. It has only one parameter p. For different values of p (0≤ p≤ 1), we

get different Bernoulli distributions.

dichotomous nature i.e. Success / failure, present / absent, defective / non defective, yes /

no etc.

Example: When a fair coin is tossed the outcome is either head or tail. The variable “X”

assumes 1 or 0.

An experiment which results in two mutually exclusive and exhaustive outcomes is called

a Bernoulli experiment. Let a Bernoulli experiment be repeated “n” times under identical

conditions, Let Xi, for i=1 to n, assume the values 1 or 0. Then Xi is a Bernoulli Variate

with probability p. Let X = X1 + X2 +……..+Xn denote the number of success in the “n”

repetition. Then X forms Bernoulli distribution. Its mean is p an variance is pq.

Learning Objective 2

1. Binomial Distribution:

P(X) = nCxqn-x px, x = 0 to n. The Binominal Distribution is given by

success. The mean and variance of the distribution are np and npq. “n” and “p”

are its parameters. It is a unimodal distribution. For fixed n or p as p or n

increases the distribution shifts from left to right.

ii. The probability of success should remain the same from experiment to

experiment.

manufactured lot.

2. Number of seeds germinating among 10 seeds sown.

3. Number of heads turned in tossing 8 coins.

given by

to construct theoretical distribution for given observed distribution.

to find expected values iii) given the parameters to find the distribution.

Type i)

Example 1: An unbiased coin is tossed 6 times. What is the probability that the

tosses will result in i) Exactly two head ii) At least 5 head iii) at most two heads

iv) not greater than one v) not less than five heads vi) at least one head.

. (1/2)2=

%. In a firm having 5 employees, what is the probability that i) None ii) Exactly

Two iii) More than 4 will contract the disease.

∴ Binominal Distribution is (0.8 + 0.2)5

i. P(X = 0) = (0.8)5

= 10 x 0.512 x 0.04

= 0.2048

= 0.00032

Example 3: The probability that a bomb dropped on a bridge hits it is 0.5. Eight

bombs are dropped on the bridge. The bridge will be destroyed if any two bombs

fall on it. i) Find the probability that all bombs hit it ii) the bridge is destroyed.

Solution:

Let the probability that the bomb will hit the bridge be p.

Type ii)

Example 4: A random sample of 5 sachets of coconut oil were examined and two

were found to be leaking. A wholesaler receives six hundred and twenty five

packets, each containing 5 sachets. Find the expected number of packets to

contain exactly one packet leaking.

Solution:

Given n = 5

Binominal Distribution = (3/5 + 2/5)5

. (2/5)1=162/625

NP=625x(162/625)=162

Example 5: Bring out the fallacy, if any, in the following statement on Binominal

Distribution. “The mean of a B.D is 4 and variance is 5″.

Solution:

Given np = 4 (Mean)……………..(1)

npq = 5 (Variance)………….(2)

Therefore, npq/np=5/4

mean is 3 and variance is 2.

Solution:

and since np = 3

n . 1/3 = 3 or n = 9

. (1/3)3 = 1792 / 6561

Learning Objective 3

Poisson Distribution:

P(X)=e-m.mx/X!

X varies from 0 to infinity. The mean and variance of the distribution is m. Its

also given by np i,e. m = np. where p is the probability of success and n is the

number of trials. It is a unimodal distribution. It is also known as the distribution

of “RARE EVENTS”. It is the limiting form of Binomial Distribution as n tends

to .

Assumption

such that np is a constant m.

5. Number of incoming telephone calls at an exchange per minute.

6. Number of radio-active particles emitted by substances.

7. Number of defects in a product.

8. Number of micro-organisms developed during a period.

Recurrence Relation

Examples 7: Suppose 2 house in thousand catches fire in a year and there are

2000 houses in a village. What is the probability that

i) none ii) at least one iii) Not more than 2 houses catches fire.

Solution:

P=2/1000=0.002 n=2000

∴ m = np = 2000 x 0.002 = 4

i.

= = =0.2366

defective. A cartoon contains 200 bulbs. Find the probability that the cartoon

contains 3 or more defective bulbs.

Solution:

n = 200

∴ m = np = 200 x 0.01 = 2

P(X ≥ 3) = 1 – [P(X = 0) + (P(X = 1) + P(X + 2)]= =1-e-

2

[1+2+2]

= 1 – 0.67670 = 0.3333

contains 200 pages. What is the probability that a randomly selected page has

exactly one mistake?

Solution:

Given m = 3

P(X=1)=e-3.3/1!=0.04979×3=0.14937

Example 10:

In example 9, how many pages would you expect to be free from mistakes.

Solution:

= 9.978 10 pages.

Type iii)

Example 11: X is a Poisson Variate such that P(X = 1) = P(X = 2). Find P(X = 0)

Solution:

P(X = 1) = P(X = 2)

Or,

Or, m/1=m2/2

Learning Objective 4

Normal Distribution

10. Its probability density functions is given by.

12. s means is and standard deviation is σ .

13. and σ are the parameters of the distribution.

14. It is a bell-shaped curve.

15. It is symmetric about its mean.

16. The mean divides the curve into 2 equal portions.

17. Its Quartile Deviation, Q.D = 2/3 σ

18. Its Mean Deviation M.D 4/5 σ

19. The X – axis is an asymptote to the curve [Asymptote is a straight line that

touches the curve at infinity].

20. The points of inflexion occurs at σ

21. It is a unimodal Distribution.

22. Mean, Median and Mode coincide.

23. The area under normal curve within certain limits are as shown below

Standard Normal Distribution

the transformation Z = X- /σ . Z is called Standard Normal variate. Its

distribution forms a standard normal distribution whose probability density

function is given by

deviation is 1. The statistician have developed standard normal table. The table

gives the probability that z will lie between 0 and Z. Therefore to solve any

problem in Normal Distribution, we convert it to standard normal distribution and

calculate z and then refer to the table.

Examples

Examples 11:

The weight of bournvita packs packed by the filling machine follow a normal

distribution with mean weight of 500 gms and standard deviation of 10 gms. A

pack is selected at random. What is the probability that i) its weight will exceed

515 gms ii) packs weight lie within 480 to 520 gms. iii) What proportion of packs

will have less than 480 and greater than 520 gms. If 10,000 packs are supplied

how many will be rejected gms, if 480 and 520 are upper and lower limit for

acceptance.

Solution: To solve such problem will draw the normal curve and represent the

information’s given in the problem as follows

i.

P(X ≥ 515) = 0.5 – P (500 ≤ x ≤ 515)=

= 0.5 – P [0 ≤ Z ≤ 1.5]

= 0.5 – 0.4332

= 0.0668

ii)

P[480 ≤ x ≤ 520]

= P[ - 2 ≤ = ≤ 0] + P [0 ≤ = ≤ 2]

= 0.4772 + 0.4772

= 0.9544

iii)

Probability of acceptance is as found in (ii),

Type iii)

Example 12:

The sales volume of 1000 retail outlets of a soap company follow Normal

Distribution. 20 % of retail outlets sells less than 50 units per day and 15 % of

them sells 200 unit and above a) find Mean and Standard Deviation of the sales

volume b) find the expected number of retail outlets that sells units between 50

and 118 units.

Solution:

SUMMARY

Quick analysis of observed data can be done if it is identified with the theoretical

distribution. The probabilities associated with random Variate of the distribution

help us to know the chances of occurrence of several events within specified

values. We can extend the solution to the cost aspects also.

MB0024-Unit-07

Introduction

In different fields of human activity, in doing the ordinary actions of our daily life, the

decision making process is based on the observations of few units which forms a portion

of the total population. This process of studying only a portion of the population and

making decisions involves risk, the risk of making wrong decisions. Evaluation of risk

will be discussed in Testing of hypothesis chapter. This unit deals with the various

techniques of drawing samples from the population.

Learning Objective 1

a. Universe or Population:

Statistical Survey or enquiries deal with studying various characteristics of unit belonging

to a group. The group consisting of all the units is called Universe or Population.

Example: In the statistical survey aimed at determining average per capita income of the

people in the city, all earning individuals in the city form the population.

Population. A population consisting of infinite number of units or units such that it is

practically impossible to observe all the units is called Infinite Population.

of physical objects actually exists. Given limited resources and time it is practically not

possible to count the number of grains of sand on the beach. Such populations are termed

as infinite population for our study.

consisting of concrete objects like the books in library is known as existent population.

Throwing a coin infinite number of times produces Hypothetical Population.

b. Sample is a finite subset of a population drawn from it to estimate the characteristics of

the population. Sampling is a tool which enables us to draw conclusions about the

characteristics of the population.

• It results in considerable amount of saving of time and labour.

• The organization and administration of a sample survey is relatively much less.

• The results obtained are reliable and always possible to attach degree of

reliability.

• There is a possibility of obtaining detailed information. In other words there is a

greater scope.

• In case of infinite population, it is the only available method.

• If the units are destroyed or affected adversely in the course of investigation, then

the only method is sampling.

Sampling Theory

The law lays down that a group of units chosen at random from a large group tends to

posses the characteristics of that large group.

Suppose a particular characteristic of the population has the following shape, then the

same characteristics will also follow the same shape in the Sample.

This principle states that “other things being equal, as the sample size increases, the

results tend to be more reliable and accurate”.

Suppose the population mean is 25 units. If a sample size of 50 results in average of 24.5

units, then larger sample size of 100 will result in 24.8 units. In other words larger the

sample size, the more accurate will be the result.

c. Principle of persistence of small numbers:

If some of the units in a population possess markedly distinct characteristics, then it will

be reflected in the sample values also.

For example, if there are 300 Blind persons in a population of 10,000 persons, then a

sample of hundred will have more or less same proportion of Blind persons in it.

d. Principle of Validity:

A sampling design is said to be valid if it enables us to obtain tests and estimation about

population parameters.

e. Principle of Optimization:

obtaining maximum possible efficiency with given level of cost.

a. Parameter: Any statistics, like mean, median, etc calculated from population

values are known as parameters of the population and denoted by Greek letters (µ ,

σ etc).

b.

Statistics: Any statistics calculated from the sample are known as statistic and are

denoted by English letters (X, S, etc)

possible combinations and their mean are tabulated below.

1 1,2 1.5

2 1,3 2

3 1,4 2.5

4 1,5 3

5 2,3 2.5

6 2,4 3

7 2,5 3.5

8 3,4 3.5

9 3,5 4

10 4,5 4.5

This gives the means of sample size 2. We form a distribution of sample means.

X f

fx fx2

Mean Frequency

1.5 1 1.5 2.25

2 1 2.0 4.00

2.5 2 5.0 12.50

3 2 6.0 18.00

3.5 2 7.0 24.5

4 1 4.0 16.0

4.5 1 4.5 20.25

N 10 30 97.50

The above table represents the sampling distributions of Means. We observe that mean of

sample means is equal to population mean.

1. d.Standard error

S=√(0.75)=0.866

In other words the standard deviation of sampling distribution of any statistic is called

standard error of that statistic. Standard error helps us in

i) Testing of hypothesis.

iii) Giving reliability measure for the statistic by its reciprocal value.

Errors in Statistics

The term error denotes the difference between population value and its estimate provided

by sampling technique. Therefore the term is not referred in its ordinary sense in

statistics.

a) Sampling error

c) Biased errors

d) Unbiased errors

a.Sampling error: The sample results are bound to differ from population results, since

sample is only a small portion of the population. It is also known as inherent error and

cannot be avoided. It is not worth to eliminate them completely.

However they follow random or chance variations and tend to cancel out each other on

averaging.

They are attributed to factors that can be controlled and eliminated by suitable actions. It

is worth to eliminate these errors. They are due to the following factors.

c.Biased errors.

It arises in both census and sampling method. They are due to personal bias of the

investigator and the instruments used for measuring. They are also due to faculty

collection of data, Respondent’s bias and bias due to non-response.

Biased errors have a tendency to grow with sample size. Therefore they are also known

as cumulative errors. The magnitude of biased errors is directly proportional to sample

size.

d.Unbiased errors.

The errors that are due to over-estimate and underestimate such that they are equal are

known as unbiased errors. They are known as compensatory errors. They do not increase

with sample size.

i. Absolute Error: is the difference between true value (t) and the observed value (a).

Symbolically AE = t – aIt is independent of magnitude of the actual value.

ii. Relative Error: is the ratio of the Absolute Error to the actual value symbolically.

It provides a degree of error for comparison purposes between different sets of data.

Learning Objective 2

i. Probability Sampling

Probability Sampling.

the law in which each unit has a predetermined probability of being included in the

sample. Different ways of assigning probability are

Under this technique sample units are drawn in such a way that each and every unit in the

population has an equal and independent chance of being included in the sample. If

sample unit is replaced before drawing next unit, then it is known as Simple Random

Sampling with replacement [SRSWR]. If the sample unit is not replaced before drawing

next unit, then it is called Simple Random Sampling without replacement [SRSWOR]. In

first case probability of drawing a unit is 1/N, where N is the population size. In the

second case probability of drawing a unit is 1/Nn.

Selection of Simple Random Sampling can be done by a) Lottery Method b) the use of

table of random numbers.

a) In lottery Method we identify each and every unit with distinct numbers by allotting an

identical card. The cards are put in a drum and thoroughly shuffled before each unit is

drawn.

b) There are several Random Numbers Tables. They are Tippet’s Random Number Table,

Fisher’s and Yate’s Tables, Kendall and Babington Smiths random tables, Rand

Corporation random number etc Specimen of Random Numbers by Tippetts is given

below.

Suppose we want to select 10 units from a population size of 100. we number the

population units from 00 to 99. Then we start taking 2 digits. Suppose we start with 41

(second row) then the other numbers selected will be 67, 95, 24, 15, 45, 13, 96, 72, 03.

respect to characteristic under study or the population distribution is highly

skewed.

We subdivide the population into several groups or strata such that i) units within

each stratum is more homogeneous ii) units between stratum are heterogeneous

and iii ) Strata do not overlap, in other words every unit of population belongs to

one and only one stratum.

The criterion used for stratification are geographical, sociological, age, sex,

income etc. The population of size N is divided into ‘K’ strata relatively

homogenous of size N1, N2………….Nk such that N1 + N2 +……… + Nk = N.

Then we draw a simple random sample from each stratum either proportional to

size of stratum OR equal units from each stratum.

different segments of population

Demerits

b. Appropriate sample sizes are not drawn from each of the stratum.

Example

Suppose 200, 300 and 500 items are produced by Factories located at three cities

X, Y and Z. We wish to draw a sample of 20 items under proportional stratified

sampling. We number the unit from 0 to 999. Then refer to Random Table and

select the numbers as

For Factory Z, it is 20x(500/1000)=10

854, 772, 733, 741, 822, 853, 570, 802, 629, 525

Systematic Sampling

in some systematic order such as geographical, chronological or alphabetical

order.

Suppose the population size is “N”. The population units are serially numbered 1

to N in some systematic order and we wish to draw a sample of “n” units, then we

divide units from 1 to N into “K” groups such that each group has n units. This

implies nK = N or K = N/n. From the first group we select a unit at random.

Suppose the unit selected is 6th unit, thereafter we select every 6 + Kth units. If K =

20, n = 5 and N = 100 then units selected are 6, 26, 46, 66, 86.

Cluster Sampling

The total population is divided into recognizable sub-divisions, known as clusters

such that within each cluster units are more heterogeneous and between clusters

they are homogenous. The units are selected from each cluster by suitable

sampling techniques.

Multi-stage Sampling

The total population is divided into several stages. The sampling process is

carried out through several stages. For example we want to select 1000 colleges

from southern states. In the first stages we may select any three state. In the

second stage we may select some districts in that state. In the 3rd stage, we may

select the colleges in each district. We may adopt any sampling technique at each

stage.

Demerits are

sampled.

2.

Non-Probability Sampling

number of sample units is selected purposely so that they represent the true

characteristics of the population.

nature. The selection of sample units depends entirely upon the personal

convenience, biases, prejudices and beliefs of the investigator. This method will

be more successful if the investigator is thoroughly skilled and experienced.

Judgment Sampling

investigator. The investigator’s experience and knowledge about the population

will help to select the sample units. It is most suitable method if the population

size is less.

Merits of this method are:-

whose characteristics are known.

Convenience Sampling

also called “chunk” which refers to the fraction of the population being

investigated which is selected neither by probability nor by judgment. Further a

list or frame work should be available for the selection of the sample. There is

high chance of bias being introduced. It is used to make pilot studies.

Quota Sampling

It is a type of judgment sampling. Under this design Quotas are set up according

to some specified characteristic such as age group, income groups etc. From each

group a specified number of units are sampled according to the Quota allotted to

the group. Within the group the selection of sample units depends on personal

judgment. It has a risk of personal prejudice and bias entering the process. This

method is often used in public opinion studies.

Learning Objective 3

Sample size depends upon the size of the population; the resources available, the

degree of accuracy desired, homogeneity of the population, nature of study,

Methods of sampling used and nature of respondents.

(For infinite population)

value, Ps – Sample value which implies P – Pserror desired in the result Q = 1 –

P. “n” is sample size.

N is population size.

iii.

iv.

v.

n Sample Size

If X1, X2…………Xn is a random sample of size “n” from any population, then

the sample mean (X) is normally distributed with mean µ and variance σ 2 / n

provided “n” is sufficiently large.

From the theorem we infer i) the mean of the sampling distribution of mean will

be equal to the population mean ii) the sampling distribution of the mean

approaches normal distribution as the sample size increases iii) it permits us to

use sample statistics to make inference about population parameters irrespective

of the shape of frequency distribution of the population.

SUMMARY

There are two methods of studying the characteristics of population, census and

sampling. The various advantages of sampling and the various errors that could

prop up in using these methods were explained. Mainly there are two methods of

sampling namely i) Probability Sampling ii) non-probability sampling. The merits

and demerits of each sampling method were explained. We discussed the

procedure for determining sample size. We concluded the chapter with the

importance of central limit theorem.

MB0024-Unit-08

Introduction

Everyone makes estimates. When you are ready to cross a street, you estimate the speed

of any car that is approaching, the distance between you and that car, and your own

speed. Having made these quick estimates, you decide whether to wait, walk, or run.

Learning Objective 1

All mangers must make quick estimates too. The outcome of these estimates can affect

their organizations as seriously as the outcome of your decision as to whether to cross the

street. Credit managers estimate whether a purchaser will eventually pay his bills.

Prospective home buyers make estimates concerning the behaviour of interest rates in the

mortgage market. All these people make estimates without worry about whether they are

scientific but with the hope that the estimates bear a reasonable resemblance to the

outcome.

Managers use estimates because in all but the most trivial decisions, they must make

rational decisions without complete information and with a great deal of uncertainty

about what the future will bring. As educated citizens and professionals, you will be able

to make more useful estimates by applying the techniques described in this and

subsequent chapters.

Statistical inference is based on estimation, and hypothesis testing. In both estimation and

hypothesis testing, we shall be making inferences about characteristics of populations

from information contained in samples. Here we infer something about a population from

information taken from a sample.

Here we try to estimate with reasonable accuracy the population proportion (the

proportion of the population that possesses a given characteristic) and the population

mean. To calculate the exact proportion or the exact mean would be an impossible goal.

Even so, we will be able to make an estimate, and implement some controls to avoid as

much of the error as possible.

Types of estimates

2. an interval estimate

population parameter. A point estimate is often insufficient, because it is either

right or wrong, we do not know how wrong it is. Therefore, a point estimate is

much more useful if it is accompanied by an estimate of the error that might be

involved.

It indicates the error in two ways: by the extent of its range and by the probability

of the true population parameter lying within that range.

Learning Objective 2

Unbiasedness: This is a desirable property for a good estimator to have. The term

unbiasedness refers to the fact that a sample mean is an unbiased estimator of a

population mean because the mean of the sampling distribution of sample means

taken from the same population is equal to the population mean itself. We can say

that a statistic is an unbiased estimator if, on average, it tends to assume values

that are above the population parameter being estimated as frequently and to the

same extent as it tends to assume values that are below the population parameter

being estimated.

Efficiency refers to the size of the standard error of the statistic. If we compare

two statistics from a sample of the same size and try to decide which one is the

more efficient estimator, we would pick the statistic that has the smaller standard

error. Suppose we choose a sample of a given size and must decide whether to use

the sample mean or the sample median to estimate the population mean. If we

calculate the standard error of the sample mean and find it to be 1.05 and then

calculate the standard error of the sample median and find it to be 1.6, we would

say that the sample mean is a more efficient estimator of the population mean

because its standard error is smaller. It makes sense that an estimator with a

smaller standard error (with less variation) will have more chance of producing an

estimate nearer to the population parameter under consideration.

the sample size increases, it becomes almost certain that the value of the statistic

comes very close to the value of the population parameter. If an estimator is

consistent, it becomes more reliable with large samples.

in the sample that no other estimator could extract from the sample additional

information about the population parameter being estimated.

Learning Objective 3

Interval

Point estimates:

Results of a samples of 35 Box of bolts (bolts per box)

101 103 112 102 98 97 93

105 100 97 107 93 94 97

97 100 110 106 110 103 99

93 98 106 100 112 105 100

114 97 110 102 98 112 99

Consider the table above, we have taken a sample of 35 boxes of bolts from a

manufacturing line and have counted the bolts per box. We can arrive at the

population mean i.e. mean number of bolts by taking the mean for the 35 boxes

we have sampled. i.e. adding all the bolts and dividing by the number of boxes.

Thus using the sample mean x as the estimator we have a point estimate of the

population mean µ.

Similarly we can use the sample variance s2 and estimate the population variance,

where the sample variance s2 is given by the formula.

Interval Estimates

compute this information from the sample data as either point estimates, or as

interval estimates. An interval estimate describes a range of values within

which a population parameter is likely to lie.

The marketing research director needs an estimate of the average life in months of

car batteries his company manufactures. We select a random sample of 200

batteries with a mean life of 36 months. If we use the point estimate of the sample

mean x as the best estimator of the population mean µ, we would report that the

mean life of the company’s batteries is 36 months.

The director also asks for a statement about the uncertainty that will be likely to

accompany this estimate, that is, a statement about the range within which the

unknown population mean is likely to lie. To provide such a statement, we need to

find the standard error of the mean.

If we select and plot a large number of sample means from a population, the

distribution of these means will approximate to normal curve. Furthermore, the

mean of the sample means will be the same as the population mean. Our sample

size of 200 is large enough that we can apply the central limit theorem. Suppose

we have already estimated the standard deviation of the population of the batteries

and reported that it is 10 months. Using this standard deviation we can calculate

the standard error of the mean: so using the formula

We can tell to the director that our estimate of the life of the company’s batteries

is 36 months, and the standard error that accompanies this estimate is 0.707. In

other words, the actual mean life for all the batteries may lie somewhere in the

interval estimate of 35.293 to 36.707 months. This is helpful but insufficient

information for the director. Next, we need to calculate the chance that the actual

life will lie in this interval or in other intervals of different widths that we might

choose, ± 2σ (2 x 0.707), ± 3σ (3 x 0.707), and so on.

The probability is 0.955 that the mean of a sample size of 200 will be within ±2

standard errors of the population mean. Stated differently, 95.5 percent of all the

sample means are within ±2 standard errors from µ . “The population mean µ will

be located within ±2 standard errors from the sample mean 95.5 percent of the

time.”

Hence from the above example we can now report to the director, that the best

estimate of the life of the company’s batteries is 36 months, and we are 68.3

percent confident that the life lies in the interval from 35.293 to 36.707 months

(36 ± 1 σ x ). Similarly, we are 95.5 percent confident that the life falls within the

interval of 34.586 to 37.414 months (36 ± 2 σ x), and we are 99.7 percent

confident that battery life falls within the interval of 33.879 to 38.121 months (36

± 3 σ x).

In using interval estimates, we are not confined to ±1,2 and 3 standard errors; for

example, ± 1.64 standard errors includes about 90 percent of the area under the

curve; it includes 0.4495 of the area on either side of the mean in a normal

distribution. Similarly, ±2.58 standard error includes about 99 percent of the area,

or 49.51 percent on each side of the mean.

confidence level. This probability indicates how confident we are that the interval

estimate will include the population parameter. A higher probability means more

confidence. In estimation, the most commonly used confidence levels are 90

percent, 95 percent, and 99 percent, but we are free to apply any confidence level.

The confidence interval is the range of the estimate we are making. If we report

that we are 90 percent confident that the mean of the population of incomes of

people in a certain community will lie between Rs. 8,000 and Rs. 24,000, then the

range Rs. 8,000-Rs. 24,000 is our confidence interval. Often, however, we will

express the confidence interval in standard errors rather than in numerical values.

Thus, confidence limits are the upper and lower limits of the confidence interval.

In this case, X + 1.64 is called the upper confidence limit (UCL) and X – 1.64

If the samples are large then we use the finite population multiplier to calculate

the standard error. This is given from the previous unit as

population. For example, the government estimates by a sampling procedure the

unemployment rate, or the proportion of unemployed people, in the country’s

workforce.

We know for a binomial distribution, the mean and the standard deviation of the

binomial distribution to be

Mean µ = np

q = probability of failure = 1- p

Since we are taking the mean of the sample to be the mean of the population we

actually mean that µ -p = p

Similarly, we can modify the formula for the standard deviation of the binomial

distribution, √(npq), which measures the standard deviation in the number of

successes. To change the number of successes to the proportion of successes, we

divide √npq by n and get √(pq )/ √(n)

Example: In a very large organization the director wanted to find out what

proportions of the employees prefer to provide their own retirement benefits in

lieu of a company – sponsored plan. A simple random sample of 75 employees

was taken and found that 40%, i.e. 0.4 of them are interested in providing their

own retirement plans. The management requests that we use this sample to find an

interval about which they can be 99 percent confident that it contains the true

population proportion.

There the interval estimate for 99% level of confidence is 0.4 ± 2.58 (0.057) =

0.253 and 0.547.

establish their own retirements plans lie between 0.253 and 0.547.

So far, the sample sizes we were examining were all larger than 30. This is not

always the case. Questions like how can we handle estimates where the normal

distribution is not the appropriate sampling distribution, that is, when we are

estimating the population standard deviation and the sample size is 30 or less?

Suppose we have data only form let us say 10 weeks or sample sizes less than 30,

then fortunately, another distribution exists that is appropriate in these cases. It is

called the t distribution.

in the early 1990s. Gosset was employed by the Guinness Brewery in Dublin,

Ireland, which did not permit employees to publish research findings under their

own names. So Gosset adopted the pen name Student and published under that

name. Consequently, the t distribution is commonly called Student’s t

distribution, or simply Student’s distribution.

Because it is used when the sample size is 30 or less, statisticians often associate

the t distribution with small sample statistics. This is misleading because the size

of the sample is only one of the conditions that lead us to use the t distribution.

The second condition is that the population standard deviation must be unknown.

Use of the t distributions for estimating is required whenever the sample size is 30

or less and the population standard deviation is not known. Furthermore, in using

the t distribution, we assume that the population is normal or approximately

normal.

Degrees of freedom

What are degrees of freedom? We can define them as the number of values we

can choose freely.

population mean, and we will use n – 1degrees of freedom, where n is the sample

size. For example, if we use a sample of 20 to estimate a population mean, we

will use 19 degrees of freedom in order to select the appropriate t distribution.

With two sample values, we have one degree of freedom (2-1 = 1), and with

seven sample values, we have six degrees of freedom (7-1 = 6). In each of these

two examples, then, we had n-1 degrees of freedom, assuming n is the sample

size. Similarly, a sample of 23 would give us 22 degrees of freedom.

The table of t distribution values differs in construction from the z table or normal

distribution table used previously. The t table is more compact and shows areas

and t values for only a few percentages (10, 5, 2, and 1 Percent). Because there is

a different t distribution for each number of degrees of freedom, a more complete

table would be quite lengthy. Although we can conceive of the need for a more

complete table

A second difference in the t table is that it does not focus on the chance that the

population parameter being estimated will fall with our confidence interval.

Instead, it measures the chance that the population parameter we are estimating

will not be within our confidence interval (that is, that it will lie outside it). If we

are making an estimate at the 90 percent confidence level, we would look in the t

table under the 0.10 column (100 percent – 90 percent = 10 percent). This is 0.10

chance of error is symbolized by the Greek letter alpha α. We would find the

appropriate t values for confidence intervals of 95 percent, 98 percent, and 99

percent under the columns headed 0.05, 0.02, and 0.01, respectively.

A third difference in using the t table is that we must specify the degrees of

freedom with which we are dealing. Suppose we make an estimate at the 90

percent confidence level with a sample size of 14, which is 13 degrees of

freedom. Look under the 0.10 column until you encounter the row labelled 13.

Like a z value the t value there of 1.771 shows that if we mark off plus and minus

1.7716 (estimated standard errors of ) on either side of the mean, the area

under the curve between these two limits will be 90 percent, and the area outside

these limits(the chance of error) will be 10 percent.

Remember that in any estimation problem in which the sample size is 30 or less

and the standard deviation of the population is unknown and the underlying

population can be assumed to be normal or approximately normal, we use the t

distribution.

In all the examples above we have used, the sample size was known. Now we are

trying to estimate the sample size n. if it is too small we may fail to achieve the

objective, if it is too large we will be wasting resources. However, let’s try to

examine some of the methods that are useful in determining what sample is

necessary for any specified level of precision.

IIM wants to conduct a survey of the annual earning of its graduates in

international placements. It knows from the past experience that the standard

deviation of its population of students is $ 1500. How large a sample size should

be taken in order to estimate the mean annual earnings of last years class within $

500 at 95% level of confidence?

If you look at the problem above: it is stated that variation of $ 500 on either side

of the populations mean.

Meaning n should be greater than 34.6 or 35 if the university want to estimate the

precision with which it wants to conduct the survey.

SUMMARY

In this chapter we have seen point estimates and interval estimates. These are the

foundation for inferential statistics in estimation and hypothesis testing which we

will be discussing in the next unit. Also we have seen the concept of confidence

levels and make estimations when the sample sizes are small and large. Also we

have gone in reverse to estimate a sample size provided we know the level of

accuracy we want to construct the estimate. Also we have seen that if the sample

size is less than 30 and the populations standard deviation is not known, we use

the student’s t distribution for estimations.

MB0024-Unit-09

Introduction

Hypothesis testing begins with an assumption, called a hypothesis, that we make about a

population parameter. We assume a certain value for a population mean. To test the

validity of our assumption, we gather sample data and determine the difference between

the hypothesized value and the actual value of the sample mean. Then we judge whether

the difference is significant. The smaller the difference, the greater the likelihood that our

hypothesized value for the mean is correct. The larger the difference, the smaller the

likelihood.

Unfortunately, the difference between the hypothesized population parameter and the

actual statistic is more often neither so large that we automatically reject our hypothesis

nor so small that we just as quickly accept it. So in hypothesis testing, as in most

significant real-life decisions, clear-cut solutions are the exception, not the rule.

Learning Objective 1

Assumptions: Although hypothesis testing sounds like some formal statistical term

completely unrelated to business decision making, in fact managers propose and test

hypothesis all the time. “if we drop the price of this car model by Rs.1,500, we’ll sell

50,000 cars this year” is a hypothesis. To test this hypothesis, we have to wait until the

end of the year and count sales. Managerial hypothesis are based on intuition; the

marketplace decides whether the manager’s intuitions were correct. Hint: Hypothesis

testing is about making inferences about a population from only a small sample. The

bottom line in hypothesis testing is when we ask ourselves (and then decide) whether a

population like we think this one is would be likely to produce a sample like the one we

are looking at.

1. Testing Hypothesis

2. Null and Alternate hypothesis

In hypothesis testing, we must state the assumed or hypothesized value of the

population parameter before we begin sampling. The assumption we wish to test

is called the null hypothesis and is symbolized Ho.

Suppose we want to test the hypothesis that the population mean is equal to 500.

We would symbolize it as follows and read it, “The null hypothesis is that the

population mean = 500 written as Ho: µ = 500. The term null hypothesis arises

from earlier agricultural and medical applications of statistics. In order to test the

effectiveness of a new fertilizer or drug, the tested hypothesis (the null

hypothesis) was that it had no effect, that is, there was no difference between

treated and untreated samples.

represent it symbolically as µ H0

If our sample results fail to support the null hypothesis, we must conclude that

something else is true. Whenever we reject the hypothesis, the conclusion we do

accept is called the alternative hypothesis and is symbolized H1 (”H sub-one”).

For the null hypothesis H0: µ = 200

H1: µ

≠ 200 (population mean is not equal to 200)

The purpose of hypothesis testing is not to question the computed value of the

sample statistic but to make a judgment about the difference between that sample

statistic and a hypothesized population parameter. The next step after stating the

null and alternative hypotheses, then, is to decide what criterion to use for

deciding whether to accept or reject the null hypothesis. If we assume the

hypothesis is correct, then the significance level will indicate the percentage of

sample means that is outside certain limits. (In estimation, please remember, the

confidence level indicated the percentage of sample means that fell within the

defined confidence limits.

Hypothesis are accepted and not proved.

Even if our sample statistic does fall in the non-shaded region (the region that

makes up 95 percent of the area under the curve), this does not prove that our null

hypothesis (H0) is true; it simply does not provide statistical evidence to reject it.

Why? Because the only way in which the hypothesis can be accepted with

certainty is for us to know the population parameter; unfortunately, this is not

possible. Therefore, whenever we say that we accept the null hypothesis, we

actually mean that there is not sufficient statistical evidence to reject it. Use of the

term accept, instead of do not reject, has become standard. It means simply that

when sample data do not cause us to reject a null hypothesis, we behave as if that

hypothesis is true.

hypotheses. In some instances, a 5 % level of significance is used. Published

research results often test hypotheses at the 1 percent level of significance. It is

possible to test a hypothesis at any level of significance. But remember that our

choice of the minimum standard for an acceptable probability, or the significance

level, is also the risk we assume of rejecting a null hypothesis when it is true. The

higher the significance level we use for testing a hypothesis, the higher the

probability of rejecting a null hypothesis when it is true. 5% level of significance

implies we are ready to reject a true hypothesis in 5% of cases.

If the significance level is high then we would rarely accept the null hypothesis

when it is not true but, at the same time, often reject it when it is true.

When testing a hypothesis we come across with four possible situations depicted

as follows.

2. Hypothesis is a true, test result rejects it – we have made a wrong decision

(Type I error). It is also known as Consumer’s Risk, denoted by α

3. Hypothesis is False, test result accepts it – we have made a wrong decision

(Type II error). It is known as Producer’s Risk, denoted by β . 1 – P is

called power of the Test.

4. Hypothesis is False, test result rejects it – we have made a right decision.

Suppose that making a Type I error (rejecting a null hypothesis when it is true)

involves the time and trouble of reworking a batch of chemicals that should have

been accepted. At the same time, making a Type II error (accepting a null

hypothesis when it is false) means taking a chance that an entire group of users of

this chemical compound will be poisoned. Obviously, the management of this

company will prefer a Type I error to a Type II error and, as a result, will set very

high levels of significance in its testing to get low β s.

Suppose, on the other hand, that making a Type I error involves disassembling an

entire engine at the factory, but making a Type II error involves relatively

inexpensive warranty repairs by the dealers. Then the manufacturer is more likely

to prefer a Type II error and will set lower significance levels in its testing.

After deciding what level of significance to use, our next task in hypothesis

testing is to determine the appropriate probability distribution. We have a choice

between the normal distribution, and the t distribution. The rules for choosing the

appropriate distribution are similar to those we encountered in the unit on

estimation. The Table below summarizes when to use the normal and t

distributions in making tests of means. Later in this unit, we shall examine the

distributions appropriate for testing hypotheses about proportions.

Remember one more rule when testing the hypothesized values of a mean. As in

estimation, use the finite population multiplier whenever the population is finite in

size, sampling is done without replacement, and the sample is more than 5 percent

of the population.

about means

When the Population When the Population

Standard Deviation is Standard Deviation is

known not known

Normal distribution, z Normal distribution, z –

Sample size n is larger than 30

– table table

Sample size n is 30 or less and

Normal distribution, z

we assume the population is t Distribution, t – table

– table

normal or approximately so

Learning Objective 2

A two-tailed test of a hypothesis will reject the null hypothesis if the sample mean

is significantly higher than or lower than the hypothesized population mean. Thus,

in a two-tailed test, there are two rejection regions. This is shown in figure 1 of

9.12.

is some specified value) and the alternative hypothesis is µ

µ Ho .

with a mean life of µ = µ Ho = 1,000 hours. If the lifetime is shorter, he will lose

customers to his competitions; if the lifetime is longer, he will have a very high

production cost because the filaments will be excessively thick. In order to see

whether his production process is working properly, he takes a sample of the

output to test the hypothesis Ho; µ = 1,000. Because he does not want to deviate

significantly from 1,000 hours in either direction, the appropriate alternative

hypothesis is H1: µ

1,000, and he uses a two-tailed test. That is, he rejects the null hypothesis if the

mean life of bulbs in the sample is either too far above 1,000 hours or too far

below 1,000 hours.

However, there are situations in which a two-tailed test is not appropriate, and we

must use a one-tailed test.

Example 2: Consider the case of a wholesaler that buys light bulbs from the

manufacturer discussed earlier. The wholesaler buys bulbs in large lots and does

not want to accept a lot of bulbs unless their mean life is at least 1,000 hours or a

minimum of 1,000 hours. As each shipment arrives, the wholesaler tests a sample

to decide whether it should accept the shipment. The company will reject the

shipment only if it feels that the mean life is below 1,000 hours. If it fells that the

bulbs are better than expected (with a mean life above, 1,000 hours), it certainly

will not reject the shipment because the longer life comes at no extra cost. So the

wholesaler’s hypotheses are Ho: µ = 1,000 and H1: µ < 1,000 hours. It rejects

Ho only if the mean life of the sampled bulbs is significantly below 1,000 hours.

This situation is illustrated in the figure below. From this figure, we can see why

this test is called a left-tailed test (or a lower-tailed test).

In general, a left tailed (lower-tailed) test is used if the hypotheses are Ho: µ =

µ Ho. In such a situation, it is sample evidence with the sample mean significantly

below the hypothesized population mean that leads us to reject the null hypothesis

in favor of the alternative hypothesis. Stated differently, the rejection region is in

the lower tail (left tail) of the distribution of the sample mean, and that is why we

call this a lower-tailed test.

A left-tailed test is one of two kinds of one-tailed tests. As you have probably

guessed by now, the other kind of one-tailed test is a right-tailed test (or an upper-

tailed test). An upper-tailed test is used when the hypotheses are Ho: µ > µ Ho.

Only values of the sample mean that are significantly above the hypothesized

population mean will cause us to reject the null hypothesis in favor of the

alternative hypothesis. This is called an upper-tailed test because the rejection

region is in the upper tail of the distribution of the sample mean.

This is to remind you again that, in each example of hypothesis testing, when we

accept a null hypothesis on the basis of sample information, we are really saying

that there is no statistical evidence to reject it. We are not saying that the null

hypothesis is true. The only way to prove a null hypothesis is to know the

population parameter, and that is not possible with sampling. Thus, we accept the

null hypothesis and behave as if it is true simply because we can find no evidence

to reject it.

Test

Description of Test Test Statistics Notes

No.

P – Population

proportion

1

infinite population proportion

Q = 1 – P, n sample

size

2 Test for specified proportion – P = Population

Finite Population proportion

Ps = Sample

Q = 1 –P, n – Sample

size

N – Population size

P1 -first sample

proportion

P2 -second sample

proportion

Test between proportions –

3

different Population Q1 = 1 – P, Q2 = 1-P2

n2 – second sample

size

P1 -first sample

proportion

P2 -second sample

proportion

Test between proportion –

4

same population Q1 = 1 – P, Q2 = 1-P2

n2 – second sample

size

No.

5 Test for specified mean µ – Population mean

– infinite population

µ s = Sample mean

σ = Population S.D

S.D. is not given

µ – Population mean

µ s = Sample mean

6

– Finite Population

We can use Sample S.D

S.D. is not given

P1 -first sample

proportion

P2 -second sample

proportion

Test between means –

7

different Population

Q1 = 1 – P, Q2 = 1-P2

8

same population

Where

Test Procedure

Step 2: State the level of significance. This gives you the tabulated normal / t –

value

Step 3: Select the appropriate test from the list given in 9.2 and next chapter 10

Step 4: Calculate the required values for the test

Learning Objective 3

small sample test (will be discussed in unit 10).

Step 2: Check whether the data is attribute or variable. If the words mean and S.D

are used, then it is test for variable, other wise it is test for attribute.

Step 3: Check whether it is a test for specified value or between values. If two

sample sizes are given, then it is between values, otherwise it is for specified

value.

finite population. If it is between values test, check whether samples are from

different population or same population.

Step 6: If the words improved, more, higher, less, lower, effective, efficient,

superior, inferior etc used then it is one-tailed test, otherwise it is two tailed test.

Examples:

Example 1: Thompson press hypotheses that the average life of its latest web-

offset press is 14,500 hours. They know the SD of the press life is 2,100 hours.

From a sample of 25 presses, the company finds a sample mean of 13,000 hours.

At 0.01 significance level, should the company conclude that the average life of

the presses is less than the hypothesized 14,500 hours?

Alternate hypothesis HA: M < 14,500 (one-tailed test)

7. Test Statistics

test

Therefore

9. Test

10. Conclusion

Example 2: Theater owners in India know that a hit movie ran for an average of

84 days with a standard deviation of 10 days in each city the movie was screened.

A particular movie distributor was interested in comparing the popularity of

movie in his region with that of the population. He randomly chose 75 theatres at

random in the region and found a popular movie ran for 81.5 days.

11. State appropriate hypotheses for testing whether there was significant

difference between theatres in the distributor’s region and the population.

12. At a 1% significance level, test these hypotheses.

Alternate hypothesis HA: µ

≠ 84

(two-tailed test)

15. Test Statistic

Therefore,

17. Test

18. Conclusion

produce a new extra spicy brand of ketchup. The company’s market research team

found in a survey of 6000 households that 355 households would buy the extra

spicy brand. In an earlier more extensive study carried out 2 years ago showed

that 5% of the house holds would buy the brand then. At 2 % level of

significance, should the company conclude that there is an increased interest in

the extra spicy flavour?

21. Test Statistics

22. Given P = 0.05, Ps = 355 / 6000, = 0.05513, n = 6000, Q = 1 – P = 0.95

23. Test

24. Conclusion

to wait to purchase the new OS Windows Vista, until an upgrade has been

released. After an advertising campaign to reassure the public, Microsoft surveyed

3000 buyers and found 950 who are still skeptical. At 5% level of significance

can the company conclude that the population of skeptical people had decreased?

(Null hypothesis is rejected. Use z distribution).

27. Test Statistics

= 3000

Therefore,

29. Test

30. Conclusion

standard deviation of 5ml. A sample of 100 bottles. When measured had a mean

content of 201.3ml Test whether the machine is functioning properly use 5% level

of significance.

≠

µ s (two-tailed test)

33. Test Statistics

Therefore,

35. Test

36. Conclusion

Since Zcal (2.60) > Ztab (1.96) Ho is rejected ⇒ The machine is not

functioning properly.

SUMMARY

In this unit we discussed the four tests available for small samples. These tests can

be used for sample size (n ≤ 30) and samples whose population S.D are not

known. The different tests are illustrated with examples.

MB0024-Unit-10

Introduction

In the previous units we learned how to test hypotheses using data from either one or two

samples. We used one-sample tests to determine whether a mean or a proportion was

significantly different from a hypothesized value. In the two-sample tests, we examined

the difference between either two means or two proportions, and we tried to learn

whether this difference was significant.

Suppose we have proportions from five populations instead of only two. In this case, the

methods for comparing proportions described in for testing hypothesis for two-samples

do not apply; we must use the chi-square X2 test. chi-square X2 tests enable us to test

whether more than two population proportions can be considered equal.

Actually, chi-square X2 tests allow us to do a lot more than just test for the quality of

several proportions. If we classify a population into several categories with respect to two

attributes (such as age and job performance), we can then use a chi-square X2 test to

determine whether the two attributes are independent of each other.

Learning Objective 1

Characteristics of X2 test

• It’s a non-parametric test where no parameters regarding the rigidity of population

of populations are required.

• Additive property is also found in X2 test.

• X2 test is useful to test the hypothesis about the independence of attributes.

• The X2 test can be use in complex contingency tables.

• The X2 test is very widely used for research purposes in behavioural and social

sciences including business research.

• It is defined as ν = ∑ (0 – E)2 / E.

Degrees of Freedom:

by ν where k is the number of independent linear constraints imposed upon them.

Suppose we are asked to write any four numbers then we will have all the numbers of

our choice. If a restriction is applied or imposed to the choice that the sum of these

numbers should be 50; then the freedom of choice would be reduced to three only and

so the degrees of freedom would now be 3.

variates and the condition of the satisfaction of l linear relation is imposed upon them

(such as the estimation of some population parametric value etc.) then the effect of

these n constraints of (i) would be replaced by n – k. We have seen in (ii) that if the

sum of squares is taken about the sample mean instead of the population mean when

n is replaced by n-1 = ν , since one linear constraint had been imposed.

The sample observations should be independently and normally distributed. For this

either the parent population should be infinitely large (say, greater than 50) or

sampling should be done with replacement.

continuity is maintained only when the individual frequencies of the Variate values

remain ≥ 5. So in applying X2 test in the testing of the goodness of fit or in a

contingency table, the cell frequency should not be less than 5. In practical problems

we can combine a few values of small frequencies into one to get the pooled

frequency greater than 5.

Application of X2 test

X2 is used in testing: (i) the significance of sample variances, (ii) the goodness of fit

of a theoretical distribution, (iii) the independence in a contingency table and (iv)

whether the observed results are consistent with the expected segregations in breeding

experiments of Genetics.

Levels of significance

Tables have been prepared for the values of P, the probability of getting a value of X 2

≥ X02 where X2 is an observed value. From these tables, we can find the value of P

corresponding to an observed value if X2 and then proceed to test whether the

difference between observed and theoretical frequencies is significant or not. Smaller

the values of P, greater the divergence between fact and theory so that small values

lead us to suspect the hypothesis. Not only small values of P lead us to suspect the

hypothesis but a value of P very near to unity may also lead to a similar result. Thus if

P = 1, X2 = 0, showing that there is perfect agreement between fact and theory which

is a very improbable event. There are two conventional levels of significance.

level of significance.

2. Similar if P < 0.01, he value is significant at 1 % level.

fe is expected frequency

1. Calculate the expected frequencies. In general the expected frequency for any cell

can be calculated from the following expression:

2. Take the difference between observed and expected frequencies and obtain the

squares of these differences (O – E)2

3. Divide the values obtained in step 2 by the respective expected frequency and add

all the values to get the value according to the formula Σ (f0 – fe)2 / fe.

Interpretation

After ascertaining the X2 value, the X2 table comprises of columns headed with

symbols ψ 0.05 for 5% level of significance, X20.01

for 1% level of significance and so on. The left hand side indicates the degrees of

freedom. If the calculated value of X2 falls in the acceptance region, the null

hypothesis HO is accepted and vice-versa.

Learning Objective 2

Which is the sum of the squares of n independent standard normal variates, following the

X2 distribution with n degrees of freedom.

Properties of X2 distribution

2. S.D. of X2 distribution = √2ν

3. Median of X2 distribution divides the area of the curve into two equal parts, each

part being 0.5.

4. Mode of X2 distribution is equal to degrees of freedom less 2 i,e., V-2.

5. X2 values are always positively skewed.

6. X2 values increases with the increase in the DF, there is a new ψ 2 distribution

with every increase in the no. of degrees of freedom.

7. The lowest value of X2 is zero and the highest is infinity α i,e. 0 < X2 < α .

8. When two chi-squares X12 and X22 are independent following X2 distribution with

n1 and n2 degrees of freedom, their sum X12 + X22 will follow ψ 2 distribution with

n1 + n2 degrees of freedom.

9. When n>30, √2X2 – (√2ν -1) approximately follows the standard normal

distribution.

Conditions

1. The frequencies used in chi-square test must be absolute and not in relative terms.

2. The total no. of observations collected for this test must be large.

3. Each of the observations which make up the sample of this test must be

independent of each other.

4. As X2 test is based wholly on sample data, no assumption is made concerning the

population distribution. In other words it is a non parametric-test.

5. X2 test is wholly dependent on degrees of freedom.

6. The expected frequency of any item or cell must not be less than 5, the

frequencies of adjacent items or cells should be polled together in order to make it

more than 5.

7. The data should be expressed in original units for convenience of comparison and

the given distribution should not be replaced by relative frequencies or

proportions.

8. This test is used only for drawing inferences through test of the hypothesis, so it

cannot be used for estimation of parameter value.

Uses of X2 test

• Test goodness of fit for one way classification or for one variable only.

• Test of independence or interaction for more than one row or column in the form

of a contingency table concerning several attributes

• Test of population Variance σ 2 through confidence intervals suggested by X2

test.

Application of X2 – test

The number of degrees of freedom is given by (No. of rows – 1) x (No. of column –

1)

Example 1: The following table gives the production in three shifts and the number of

defective goods that turned out in three weeks. Test at 5% level of significance whether

weeks and shifts are independent.

I 15 5 20 40

II 20 10 20 50

III 25 15 20 60

Total 60 30 60 150

Solution:

15 40 x 60 /150 = 16 1 0.0625

20 50 x 60/150 = 20 0 0.0000

25 60 x 60/150 = 24 1 0.0417

5 40 x 30/150 = 8 9 1.1250

10 50 x 30/150 = 10 0 0.0000

15 60 x 30/150 = 12 9 0.7500

20 40 x 60/150 = 16 16 1.0000

20 50 x 60 /150 = 20 0 0.0000

20 60 x 60/150 = 24 16 0.6667

2

X 3.6459

3. Test Statistics

4. Test X2cal = 3.6459

5. Conclusion: Since X2cal (3.6459) < X2tab (9.49) Ho is accepted.

Example 2: Out of 1000 people surveyed 600 belonged to urban area and rest to rural

area. Among 500 who visited other states 400 belonged to urban area. Test at 5% level of

significance whether area and visiting other states are dependent.

Visited 400 100 500

Not Visited 200 300 500

Total 600 400 1000

(O – E)2

2

Observed Value (O) Expected Value (E) (O – E)

E

400 300 10000 33.33

200 300 10000 33.33

100 200 10000 50.00

300 200 10000 50.00

X2cal 1.66.66

3. Test Statistics

1. Test X2cal = 1.66.66

2. Conclusion: Since X2cal (1.66.66) > X2tab (3.845) Ho is rejected.

absenteeism is greater on one day of the week than on another day of the week. He has

the following record for the past years.

Solution: If the absenteeism is uniformly distributed over the week, then expected No. of

absenteeism per day should be

E = 66 + 57 + 54 + 48 + 75 /5 = 60

66 60 36 0.6000

57 60 9 0.1500

54 60 36 0.6000

48 60 144 2.4000

75 60 225 3.7500

2

X cal 7.5000

Alternate hypothesis HA: They are dependent

2. Level

3. Test Statistics

5. Conclusion: Since X2cal (7.5) < X2tab (9.49) Ho is rejected.

types in a generation should be 9:3:3:1. In an experiment with 1600 beans the frequency

of bean of A, B, C and D type was observed to be 882, 313, 287 and 118 respectively

Does the result support the theory.

Solution:

882 1600 x 19 / 10 = 900 324 0.36

313 300 169 0.56

287 300 169 0.56

118 100 324 3.24

X2cal 4.72

2. Conclusion: Since X2cal (4.72) < X2tab (7.81) Ho is rejected.

Suppose we want to test whether the population has a given variance σ 02, then

and

If the calculated value lie between K1 and K2 then H0 is accepted K1 and K2 values are

read from the table.

randomly selected plants have heights 172, 156, 154, 163, 170, 169, 170 and 164 cms.

Test whether the sample standard deviation differs significantly?

Solution:

X d = X – 160 d2

17

12 144

2

15

-4 16

6

15

-6 36

4

16

3 9

3

17

10 100

0

16

9 81

9

17

10 100

0

16

4 16

4

38 502

S2=40.1875

∴ nS2 = 321.5

SUMMARY

attributes, goodness of fit and specified variance. It assumes that samples are

drawn at random and external forces, if any, act on them in equal magnitude. The

sample size should be very large. None of the theoretical expected values

calculated should be less than five.

MB0024-Unit-11

Introduction

and

are given by these formulae

It is also known as variance Ratio test. It has two degrees of freedom, one for numerator

and another for denominator of the ratio. They are represented by

ν 1 = n1 – 1 and ν 2 = n2 – 1.

Learning Objective 1

• The parent population from which they are drawn are normal.

Examples

Can we conclude that variance of time distribution for method I and method II are

same?.

Solution:

Method I Method II

X d = X – 22 d2

27 5 25

23 1 1

16 -6 36

20 -2 4

26 4 11

22 0 0

Total 2 82

X d = X – 35 d2

33 -2 4

35 0 0

34 -1 1

27 -8 64

42 7 49

32 -3 1

38 3 9

Total -4 136

= 22.286

2

=σ 2

2

2

≠

σ 22

2. Test Statistics

3. Test

FCal=22.286/16.266=1.37

1. Conclusion

Learning Objective 2

ANOVA: ANOVA will enable us to test for the significance of the differences of

variances among more than two sample means. Using analysis of variance we will be

able to make inferences about whether our samples are drawn from populations having

the same mean.

five different brands of gasoline, testing which of four different training methods produce

the fastest learning record, or comparing the first-year earnings of the graduates of half a

dozen different business schools. In each of these cases, we would compare the means of

more than two samples.

In statistical terms the difference between two statistical data is known as variance. When

two data are compared for any practical purpose, their difference is studied through the

techniques of Analysis of Variance. Initially the technique was applied in the field of

Zoology and Agriculture but in a later stage it was applied in other fields also. In analysis

of variance the degree of variance between two or more data as well as the factor

contributing towards the variance is studied.

with the view of testing whether the means of specific classification differ significantly or

they are homogeneous.

The analysis of variance is a method of splitting the total variation of data into constituent

parts which measure different sources of variations. The total variation is split up into the

following two-components.

Total variance = Variance between the samples + Variance within the samples.

After obtaining the above two variations, these two variations are tested for their

significance by F-test which is also known as Variance Ration Test.

ii. To find a measure of variation between or among the components. Then the

significance of difference between the variations in two series or more may be

measured.

In other words, with the help of the technique of analysis of variance we can test the

hypothesis that the means of all the components constituting a population are equal to the

mean of the population or that the samples have come from the same population.

ANOVA Table

source of variance, the sum of squares, degrees of freedom, mean square (variance) and

the formula for the F-ratio is known as ANOVA table.

The actual analysis of variance is carried out on the basis of ratio between the variances.

The variance ratio is obtained by dividing the variance between the samples by the

variance within the sample. The ratio forms the test statistic known as F-Statistics, i,e.,

Assumptions

ii. Population from which the samples are selected are normally distributed.

iv. Each of the population has the same variations and identical means.

The analysis of variance is mainly carried on under the following two classifications:

ii. Two way analysis of variance or two way classified data or manifold classification.

‘ANOVA’ table presents the various results obtained while carrying out the analysis of

variance. A specimen of ANOVA table is given below.

Between Samples SSC K–1 MSC

Total SST N–1

Example 1: Below are given the yield (in Kg) per acre for 5 trial plots of varieties of

treatment.

1 2 3 4

1. 42 48 68 80

2. 50 66 52 94

3. 62 68 76 78

4. 34 78 64 82

5. 52 70 70 66

Treatment

Plot No. (X1) (X2) (X3) (X4)

1 2 3 4

1. 42 48 68 80

2. 50 66 52 94

3. 62 68 76 78

4. 34 78 64 82

5. 52 70 70 66

Total 240 330 330 400

T2 / N = 13002 / 20 = 84500

SSE = SST – SSC = 4236 – 2580 = 1656

MSC=SSC/(K-1)=2580/(4-1)=2580/3=860

MSE=SSE/(N-K)=1656/(20-4)=103.5

the total number of observations.

Between Samples SSC = 2580 K–1=3 MSC = 860

Total SST N–1

F=MSC/MSE=860/103.5=8.3

The table value of F at 5% level of significance for (3, 16) df is 3.24 is less than the

calculated value of F. Therefore the null hypothesis is rejected. Therefore the treatments

do not have the same effect.

In the two way classification, observations are classified into groups on the basis of two

criteria.

Steps:

1. (a) Assume the means of all columns are equal. That is the effects of all factors in

one kind of treatment are equal.

i,e., α 1 = α 2 = α 3 =………α c

(b) Assume the means of all rows are equal. That is, the effects of all factors in

the second kind of treatment are equal.

i,e., β 1 = β 2 = β 3 =………β r

2. Find SST = Sum of squares of all observations – T2 / N

3. Find

4. Find

1. MSC = SSC/MSE ; MSR=SSR/(r-1) ; MSE= SSE/{(c-4)(r-1)}

2.

Anova Table for two way Analysis of Variance

Between Columns SSC c–1 MSC

Fc

Within Rows SSR r–1 MSR

Fr

Residual SSE (c-1) x (r -1) MSE

Total SST N–1

Example

2:

Three varieties of crops A, B, C are tested in a randomized block design with four

replications – The yields are given below:

Variet Replications

y 1 2 3 4

A 6 4 8 6

B 7 6 6 9

C 8 5 10 9

Test whether there is difference between replications. Test also whether varieties differ

significantly

y 1 2 3 4

A 6 4 8 6 24

B 7 6 6 9 28

C 8 5 10 9 32

Total 21 15 24 24 84

T2/N=842/12=588

= 62+72+82+42+62+52+82+62+102+62+92+92 – 588 = 36

SST = 36

=18

MSC=SSC/(c-1)=18/3=6

SSR = 8

MSR=SSR/(r-1)=8/2=4

MSE=SSE/{(r-1)(c-1)}=10/6=1.667

Source of

Sum of Squares d.f. Mean Square F.Ratio

Variation

Between Columns SSC = 18 c–1=3 MSC = 6

Fc = 6/1.667 = 3.6

Within Rows SSR = 18 r–1=2 MSR = 4

Fr = 4/1.667 = 2.4

Residual SSE = 10 (c-1) x (r -1) = 6 MSE = 1.667

Total SST = 36 N – 1 = 11

Between columns:

replications.

Between rows:

Therefore we accept the hypothesis that there is no significant difference between the

varieties.

SUMMARY

F-test is used to test the equality of two variance. ANOVA is used to test the

equality of several means using the relation σ x = σ / √n.

MB0024-Unit-12

Introduction

Both correlation and regression are used to measure the strength of relationships between

variables.

The following statistical tools measure the relationship between the variable analyzed in

social science research.

1. Correlation

a. Simple correlation – Here the relationship between two variables are studied.

b. Partial correlation – Here the relationship of any two variables are studied,

keeping all others constant.

c. Multiple correlation – Here the relationship between variables are studied

simultaneously.

2. Regression

a. Simple regression

b. Multiple regression

3. Association of Attributes

Correlation measures the relationship (positive or negative, perfect) between the two

variables. Regression analysis considers relationship between variables and estimates the

value of another variable, having the value of one variable. Association of Attributes

attempts to ascertain the extent of association between two variables.

Learning Objective 1

Correlation

When two or more variables move in sympathy with other, then they are said to be

correlated. If both variables move in the same direction then they are said to be positively

correlated. If the variables move in opposite direction then they are said to be negatively

correlated. If they move haphazardly then there is no correlation between them.

2. Testing the relationship for its significance.

3. Giving confidence interval for population correlation measure.

The correlation between two variables may be due to the following causes,

i) Due to small sample sizes. Correlation may be present in sample and not in

population.

ii) Due to a third factor. Correlation between yield of rice and tea may be due to a

third factor “rain”

Types of Correlation

Types of correlation are given below

a. Positive or Negative

Positive correlation: Both the variables (X and Y) will vary in the same direction. If

variable X increases, variable Y also will increase; if variable X decreases, variable Y

also will decrease. Negative Correlation: The given variables will vary in opposite

direction. If one variable increases, other variable will decrease.

Simple, Partial and Multiple correlations: In simple correlation, relationship between two

variables are studied. In partial and multiple correlations three or more variables are

studied. Three or more variables are simultaneously studied in multiple correlations. In

partial correlation more than two variables are studied, but the effect on one variable is

kept constant and relationship between other two variables is studied.

Linear and Non-Linear correlation: It depends upon the constancy of the ratio of change

between the variables. In linear correlation the percentage change in one variable will be

equal to the percentage change in another variable. It is not so in non linear correlation.

Measures of correlation

i) Scatter Diagram.

Scatter Diagram

The ordered pair of observed values are plotted on x y plane as dots. Therefore it is also

known as Dot Diagram. It is diagrammatic representation of relationship.

If the dots lie exactly on a straight line that runs form left bottom to right top, then the

variables are said to be perfectly positively correlated (fig.i).

If the dots lie close to a straight line that runs from left bottom to right top, then the

variables are said to be positively correlated (fig.ii).

If the dots lie exactly on a straight line that runs from left top to right bottom then the

variables are said to be perfectly negatively correlated (fig iii).

If the dots lie very close to a straight line that runs from left top to right bottom then the

variables are said to be negatively correlated (fig iv).

If the dots lie all over the graph paper then the variables have zero correlation (fig v).

Scatter diagram tells us the direction in which they are related and does not give any

quantitative measures for comparison between sets of data.

It is defined as

1. i. …………………………….(A)

Where

n – number of paired observations

∑xy / N is called covariance of x and y. The other forms of this formula are

ii. ii.

For all practical purpose we can conveniently use form D. Whenever summary

information is given choose proper form from A to C.

• It is not affected by change of origin or change of scale.

• It is a relative measure (does not have any unit attached to it)

The size of r is very much dependent upon the variability of measured values in the

correlation sample. The greater the variability, the higher will be the correlation,

everything else being equal.

The size of r is altered when researchers select extreme groups of subjects in order to

compare these groups with respect to certain behaviors. Selecting extreme groups on one

variable increases the size of r over what would be obtained with more random sampling.

Combining two groups which differ in their mean values on one of the variables is not

likely to faithfully represent the true situation as far as the correlation is concerned.

Addition of an extreme case (and conversely dropping of an extreme case) can lead to

changes in the amount of correlation. Dropping of such a case leads to reduction in the

correlation while the converse is also true. (Source: Aggarwal.Y.P, Statistical Methods,

Sterling Publishers Pvt Ltd., New Delhi, 1998, p.131).

Problems

X 20 16 12 8 4

Y 22 14 4 12 8

X Y X2 Y2 XY

20 22 400 484 440

16 14 256 196 224

12 4 144 16 48

8 12 64 144 96

4 8 16 64 32

∑X = 60 ∑Y = 60 ∑X = 880 ∑Y = 904 ∑XY = 840

2 2

Applying the formula for r and substituting the respective values from the above

table we get r as:

Example 2: Calculate Karl Pearson Coefficient of Correlation from the following data:

Index of Production 100 102 104 107 105 112 103 99

Number of unemployed 15 12 13 11 12 12 19 26

Solution:

Production X X– unemployed Y–

x

y

1985 100 -4 16 15 0 0 0

1986 102 -2 4 12 -3 9 +6

1987 104 0 0 13 -2 4 0

1988 107 +3 9 11 -4 16 - 12

1989 105 +1 1 12 -3 9 -3

1990 112 +8 64 12 -3 9 - 24

1991 103 -1 J 19 +4 16 -4

1992 99 -5 25 26 + 11 121 - 55

∑x2 = ∑y2 = ∑xy =

∑X = 832 ∑x = 0 ∑Y = 120 ∑y = 0

120 194 -92

= 104 = 15

X 50 60 58 47 49 33 65 43 46 68

Y 48 65 50 48 55 58 63 48 50 70

Solution:

Using the formula for calculating r as

covariance (∑x,y) = -17.5. Find coefficient of correlation between x and y.

R=-17.5/(7X3)=-0.833

There is a high negative correlation.

Example 5: Ten observation in Weight (x) and Height (y) of a particular age group gave

the following data.

Find “r”

Solution: we know

Probable Error

of testing the reliability of “r”. It is given by

It is used to

c) If P.E < r < 6 P.E, we can not say anything about the significance of “r”

ii) Construct confidence limits within which population “P” is expected to lie.

Conditions under which P.E can be used.

2. The value of “r” must be determined from sample

values.

3. Samples must have been selected at random

Example 6

If r = 0.6 and N = 64, a) Interpret ‘r’ b) find the limits within which ‘ρ ‘ is suppose to lie.

Solution:

It is highly significant

= 0.6 ± 0.054

= 0.546 – 0.654

ii) The variables under study are affected by a large number of independent

causes so as to form a normal distribution. When we do not know the shape of

population distribution and when the data is qualitative type Spearman’s Ranks

correlation coefficient is used to measure relationship.

It is defined as

ρ lies between – 1 and +1 and its interpretation is same as that of Karl

Pearson’s correlation coefficient.

There are 3 types of problems

Example 7: In a singing competition, two judges assigned the following ranks for 7

candidates. Find Spearman’s rank correlation coefficient.

Competitor 1 2 3 4 5 6 7

Judge I 5 6 4 3 2 7 1

Judge II 6 4 5 1 2 7 3

Solution:

1 5 6 -1 1

2 6 4 -2 4

3 4 5 -1 1

4 3 1 2 4

5 2 2 0 0

6 7 7 0 0

7 1 3 2 4

13

Difference

Score on Score on Rank Of Rank on Difference

Student

Test I Test II Test I Test II between squared

Ranks

X Y R1 R2 D D2

A 16 8 2 5 -3 9

B 14 14 3 3 0 0

C 18 12 1 4 -3 9

D 10 16 4 2 2 4

E 2 20 5 1 4 16

N=5 ∑D2 = 38

Example 9: The sales statistics of 6 sales representatives in two different localities. Find

whether there is a relationship between buying habits of the people in the localities.

Representative 1 2 3 4 5 6

Locality I 70 40 65 110 60 20

Locality II 70 30 80 100 90 20

Solution:

1 2 4 -2 4

2 5 5 0 0

3 3 3 0 0

4 1 1 0 0

5 4 2 2 4

6 6 6 0 0

0 8

There is high positive correlation between buying habits of the locality people.

iii When Ranks are repeated

Example 10

Student A B C D E F G H I J

Score on Test I 20 30 22 28 32 40 20 16 14 18

Score on Test II 32 32 48 36 44 48 28 20 24 28

Difference

Score on Score on Rank Of Rank on Difference

Student

Test I Test II Test I Test II between squared

Ranks

X Y R1 R2 D D2

A 20 32 6.5 5.5 0 1.00

B 30 32 3 5.5 - 2.5 6.25

C 22 48 5 1.5 3.5 12.25

D 28 36 4 4 0 0

E 32 44 2 3 - 1.0 1.00

F 40 48 1 1.5 - 0.5 0.25

G 20 28 6.5 7.5 - 1.0 1.00

H 16 20 9 10 - 1.0 1.00

I 14 24 10 9 1.0 1.00

J 18 28 8 7.5 0.5 0.25

N = 10 ∑D2 = 24

Testing of Correlation

“t” test is used to test correlation coefficient. Height and weight of a random sample of

six adults

Weight (Kg) 57 64 70 76 71 82

It is reasonable to assume that these variables are normally distributed, so the Karl

Pearson Correlation coefficient is the appropriate measure of the degree of association

between height and weight. R = 0.875

H1: ρ > 0 This implies that there is positive correlation in the population (increasing

height is associated with increasing weight) 5% significance level is taken

Since the calculated value is more than the table value. Null hypothesis is rejected. There

is significant positive correlation between height and weight.

Partial Correlation

Partial Correlation is used in a situation where three and four variables involved. Three

variables such as age, height and weight. Correlation between height and weight can be

computed by keeping age constant. Age may be the important factor influencing the

strength of relationship between height and weight. Partial Correlation is used to keep

constant the effect of age. The effect of one variable is partialled out from the correlation

between other two variables. This statistical technique is known as partial correlation.

Partial Correlation is denoted by the symbol r12.3. Here correlation between variable 1 and

2 keeping 3rd variable constant.

r12.3 = Partial correlation between variables 1 and 2 keeping 3rd constant

Similarly,

and

Multiple Correlation

Three or more variables are involved in multiple correlations. The dependent variable is

denoted by X1 and other variables are denoted by X2, X3 etc. Gupta S.P, has expressed

that “the coefficient of multiple linear correlation is represented by R1 and it is common

to add subscripts designating the variables involved. Thus R1.234 would represent the

coefficient of multiple linear correlations between X1 on the one hand X2, X3 and X4 on

the other. The subscript of the dependent variable is always to the left of the point:

The coefficient of multiple correlations for r12, r13 and r23 can be expressed

correlation is 1, it shows that the correlation is prefect. If it is 0, it shows that there is no

linear relationship between the variables. The coefficient of multiple correlation are

always positive in sign and range from 0 to + 1.

formula for computing R1.23 is:

Similarly alternative formulas for R1.24 and R1.34 can be computed

The following formula can be used to determine a multiple correlation coefficient with

three independent variables.

Multiple correlation analysis measures the relationship between the given variables. In

this analysis the degree of association between one variable considered as the dependent

variable and a group of other variables considered as the independent variables.

Example 11: The following zero order correlation coefficients are given

Calculate multiple correlation coefficient treating first variable as dependent and second

and third variables as independent. (source: Gupta S.P, Statistical Method)

Solution:

Using the formula for multiple correlation coefficient for R1.23 we get:

Learning Objective 2

Regression

Regression is defined as, “the measure of the average relationship between two or more

variables in terms of the original units of the data.”

Correlation analysis attempts to study the relationship between the two variables x and y.

Regression analysis attempts to predict the average x for a given y. In Regression it is

attempted to quantify the dependence of one variable on the other. Example: There are

two variables x and y. y depends on x. The dependence is expressed in the form of the

equations.

Regression Analysis

Regression Analysis used to estimate the values of the dependent variables from the

values of the independent variables.

Regression analysis is used to get a measure of the error involved while using the

regression line as a basis for estimation.

correlation that prevails between the given two variables.

Regression Lines

For a set of paired observations there exist two straight lines. The line drawn such that

sum of vertical deviation is zero and sum of their squares is minimum, is called

Regression line of y on x. It is used to estimate y – values for given x – values. The line

drawn such that sum of horizontal deviation is zero and sum of their squares is minimum,

is called Regression line of x on y. it is used to estimate x – values for given y – values.

The smaller angle between these lines, higher is the correlation between the variables.

Y – = byx (X –)

X– = bxy (Y – )

Where

And

The regression equations found by the above conditions is said to fitted by method of

least squares. byx and bxy are called regression coefficients.

• byx . bxy ≤ 1

• If byx is –ve, then bxy is also –ve and r is –ve.

• They can also be expressed as

•

• It is an absolute measure.

rxy = ryx byx = bxy

if byx can be greater than one, but bxy must be less than

-1< r <1

one such that byx.bxy<1

It has no units attached to it It has unit attached to it

There exist nonsense correlation There is no such nonsense regression

It is not based on cause and

It is based on cause and effect relationship

effect relationship

It indirectly helps in estimation It is meant for estimation

Examples :

Age of Husband 18 19 20 21 22 23 24 25 26 27

Age of Wife 17 17 18 18 19 19 19 20 21 22

Solution:

Age of husband (x) dx = x-22 dx2 Age of wife (y) dy = y-19 dy2 dx dy

18 -4 16 17 -2 4 8

19 -3 9 17 -2 4 6

20 -2 4 18 -1 1 2

21 -1 1 18 -1 1 1

22 0 0 19 0 0 0

23 1 1 19 0 0 0

24 2 4 19 0 0 0

25 3 9 20 1 1 3

26 4 16 21 2 4 8

27 5 25 22 3 9 15

Total 225 5 85 190 0 24 43

Regression equation of Y on X is

Y – = byx (X – )

⇒ Y – 19 = 0.521 (X – 22.5)

⇒ Y = 0.521X + 7.2775

r= = 0.966

Series X Series Y

Mean S.D 65 67

S.D 2.5 3.5

Correlation

0.8

coefficient

Solution:

The standard error of estimates helps to measure the accuracy of the estimated figures in

regression analysis. If the value of the standard error of estimate is small, it shows that

the estimate provided by the regression equation is better and closer. If standard error of

estimate is zero, it shows that there is no variation about the line and the correlation will

be perfect. “The standard error of estimate uses to ascertain how good and representative

the regression line is as a description of the average relationship between two series.

also

also

Example 13

1. The following results were worked out from scores in Statistics and Mathematics

in a certain examination.

Scores in Statistics (X) Scores in Mathematics (Y)

Mean 40 48

Standard Deviation 10 15

2.

Find the regression lines x on y and y on x. Use the regression lines to find the value of y

when x = 50 and value of x when y = 30.

Solution:

Is (X – ) = r σ x / σ y (Y – )………….(1)

Is (Y – ) = r σ y / σ x (X – )………….(2)

Y = 0.63 x + 22.80…………(4)

X 12 4 20 8 16

Y 18 22 10 16 14

Solution

X– Y–

X Y (X – )2 (Y – )2 (X – ) (Y – )

X – 12 Y – 16

1

8 0 2 0 4 0

2

4 22 -8 6 64 36 - 48

2

10 8 -6 64 36 - 48

0

8 16 -4 0 16 0 0

1

14 4 -2 16 4 -8

6

160 80 - 104

And

X – 12 = – 1.3 (Y – 16)

Y – 16 = – 0.65 (X – 12)

analysis, two or more independent variables are used to estimate the values of a

dependent variable, instead of one independent variable.

values of the two or more variables independent variables.

• To obtain the measure of the error involved in using the regression equation as a

basis of estimation.

• To obtain a measure of the proportion of variance in the dependent variable

accounted for or explained by the independent variables.

Multiple regression equation explains the average relationship between the given

variables and the relationship is used to estimate the dependent variable. Regression

equation refers the equation for estimating a dependent variable.

Example 14: Estimating dependent variable X1 from the independent variables X2,

X3………….. It is known as regression equation of X1 on X2, X3…………..

a1.23 = (Constant) the intercept made by the regression plan. It gives the value

X1 = (X1 – )

X2 = (X2 – )

X3 = (X3 – )

Σ X1X2 = b1.23

Σ X2 X3 + b13.2

Σ X3

Reliability of Estimates

Reliability of estimates test the estimated value obtained by applying regression equation,

whether the estimated value is very close to actual observed value. Standard error uses to

measure the closeness of estimate derived from the regression equation to actual observed

values. The measure of reliability is an average of the deviations of the actual value of

non-dependent variable from the estimate from the regression equation. Determining the

accuracy of estimates from the multiple regression is reliability of estimates. It is also

known as standard error of estimate.

Where

Multiple regression can be applied to test the factors such as export elasticity, import

elasticity and structural change (contribution of manufacturing sector towards GDP)

influencing over employment.

regression in their research work appropriately.

Summary

In this unit we studied the concept of correlation and regression and the different types of

correlation and regression. We saw how regression helps us to study unknown variables

with the help of known variables. It also establishes reliability measure for estimated

values.

MB0024-Unit-13

Introduction

The growing competition, rapidity of change in circumstances and the trend towards

automation demand that decisions in business are not based purely on guesses and

hunches rather on a careful analysis of data concerning the future course of events. The

future is unknown to us. Yet every day we are forced to make decisions involving future

and therefore uncertainty. Great risk is associated with business affairs. All businessmen

are forced to make forecast regarding business activities.

trade the importance of forecasting is so great, that when he enters into the business

world, he really enters the profession of forecasting. In recent times, considerable

research has been conducted in this field. Attempts are being made to make forecasting as

scientific as possible.

forecast; even if his whole product is sold before production. Forecasting has always been

necessary. What is new in the attempt to put forecasting on a scientific basis, i,e., to

forecast by reference to past history and statistics rather than by pure intuition and guess-

work.

One of the most important task before businessmen and economists these days are to

make estimates for the future. For example, a business man is interested in finding out his

likely sales next year or as long term planning in next five or ten years so that he could

adjust his production accordingly and avoid the possibility of either inadequate

production to meet the demand or unsold stocks. Similarly, an economist is interested in

estimating the likely population in the coming years so that proper planning can be

carried out with regard to jobs for the people, food supply etc. First step in making

estimates for the future consists of gathering information from the past. In this connection

we usually deal with statistical data which are collected, observed or recorded at

successive intervals of time. Such data are generally referred to as Time series. Thus

when we observe numerical data at different points of time the set of observations is

known as time series.

Learning Objective 1

Business Forecasting

Business forecasting refers to the analysis of past and present economic conditions with

the object of drawing inferences about probable future business conditions. The process

of making definite estimates of future course of events is referred to as forecasting and

the figure or statements obtained from the process is known as ‘forecast’ future course of

events is rarely known. In order to be assured of coming course of events, help is taken of

an organized system of forecasting. These are two aspects of scientific business

forecasting.

i. Analysis of past economic conditions: For this purpose, the components of active series

are to studied. The secular trend will show how the series has been moving in the past

and what its future course is likely to be over a long period. The cyclic fluctuations would

reveal whether the business activity is subjected to boom or depression. The seasonal

fluctuations would indicate the seasonal changes in the business activity.

ii. Analysis of present economic conditions: The object of analyzing present economic

conditions is to study those factors which affect the sequential changes expected on the

basis of the past conditions. Such factors are new inventions, changes in fashion, changes

in economic and political spheres, economic and monetary policies of the Government,

war etc. These factors may affect and alter the duration of trade cycle. Therefore it is

essential to keep in mind the present economic conditions since they have an important

bearing on the probable future tendency.

Forecasting is a part of human conduct. Businessmen have also to look to the future.

Success in business depends on correct predictions. In fact when a man enters business,

he automatically takes with it the responsibility for attempting to forecast the future and

to a very large extent his success or failure would depend upon the ability to forecast

successfully the future course of events. Since without same element of continuity

between past, present and future, there would be little possibility of successful prediction.

But history is not likely to repeat itself and we would hardly expect economic conditions

next year or over the next ten years to follows a clear cut prediction. Yet, frequently past

patterns prevail sufficiently to justify using the past as a basis for predicting the future.

businessman in reducing the areas of uncertainty that surround management decision

making with respect to costs, sales, production, profits, capital investment, pricing,

expansion of production, extension of credit, development of markets, increase of

inventories and curtailment of loans. These decisions cannot be made off-hand. They are

to be based on present indications of future conditions.

While forecasting, we should know that it is impossible to forecast the future precisely –

these always time must be same range of error allowed in the forecast. Statistical

forecasts are those in which we can use the mathematical theory of probability to measure

the risks of errors in predictions.

A great amount of confusion seem to have grown up in the use of words ‘forecast’,

‘prediction’ and ‘projection’. A prediction is an estimate based solely in past data of the

series under investigation. It is purely mechanical extrapolation. A projection is a

prediction where the extrapolated values are subjects to a certain numerical assumptions.

A forecast is an estimate which relates the series in which we are interested to external

factors. Forecasts are made by estimating future values of the external factors by means

of prediction, projection or forecast and from these values calculating the estimate of the

dependent variable.

i. Based on past and present conditions: The business forecasting is based on past and

present economic condition of the business. To forecast the future, various data,

information and facts concerning to economic condition of business for past and present

are analyzed.

ii. Based on mathematical and statistical methods: The process of forecasting includes

the use of statistical and mathematical methods. By using these methods the actual trend

which may take place in future can forecasted.

iii. Period: The forecasting can be made for long term, short term, medium term or any

specific term.

iv. Estimation of future: The business forecasting is to forecast the future regarding

probable economic conditions.

Steps in Forecasting

i. Understanding why changes in the past have occurred: One of the basic principles of

statistical forecasting is that the forecaster should use the data on past performance. The

current rate and changes in the rate constitute the basis of forecasting. Once they are

known various mathematical techniques can develop projections from them. If an attempt

is made to forecast business fluctuations without understanding why past changes have

taken place, the forecast will be purely mechanical based solely upon the application of

mathematical formulae and subject to series error.

ii. Determining which phases of business activity must be measured: After it is knowing

why business fluctuations have occurred, it is necessary to measure certain phase of

business activity in order to predict what changes will probably follow the present level

of activity.

independent relationship between the selection of statistical data and determination of

why business fluctuations occur. Statistical data cannot be collected and analyzed in an

intelligent manner unless there is a sufficient understanding of business fluctuations. It is

important that reasons for business fluctuations be stated in such a manner that is possible

to secure data that are related to the reasons.

iv. Analyzing the data: Lastly, the data are analyzed in the light of understanding of the

reason why change occurs. For example, if it is reasoned that a certain combination of

forces will result in a given change, the statistical part of the problem is to measure these

forces, from the data available, to draw conclusions on the future course of action. The

methods of drawing conclusions may be called forecasting techniques.

Learning Objective 2

Almost all the businessmen make forecasting about the business conditions related to

their business. In recent years scientific methods of forecasting have been developed. The

base of scientific forecasting is statistics. To handle the increasing variety of managerial

forecasting problems, several forecasting techniques have been developed in recent years.

Forecasting techniques vary from simple expert guesses to complex analysis of mass

data. Each techniques has its special use, and care must be taken to select the correct

technique for a particular situation. Before applying a method of forecasting the

following questions should be answered:

i. What is the purpose of the forecast how is it to be used?

ii. What are the dynamics and components of the system for which the forecast will be

made?

1. Business Barometers

Business indices are constructed to study and analyze the business activities on

the basis of which future conditions are predetermined. As business indices are

the indicators of future conditions, so they are also known as “Business

Barometers” or ‘Economic Barometers’. With the help of these business

barometers the trend of fluctuations in business conditions are made known and

by forecasting a decision can be taken relating to the problem. The construction of

business barometer consists of gross national product, wholesale prices, consumer

prices, industrial production, stock prices, bank deposits etc. These quantities may

be concerted into relatives on a certain base. The relatives so obtained may be

weighted and their average be computed. The index thus arrived at in the business

barometer.

general index of business activity which refers to weighted or composite

indices of individual index business activities. With the help of general index

of business activity long term trend and cyclical fluctuations in the ‘economic

activities of a country are measured but in some specific cases the long term

trends can be different from general trends. These types of index help in

formation of country’s economic policies.

ii. Business barometers for specific business or industry: These barometers are

used as the supplement of general index of business activity and these are

constructed to measure the future variations in a specific business or industry.

barometer is constructed to measure the expected variations in a specific

individual firm of an industry.

Advantages:

i. The business barometer method is scientific and reliable and used by

management for the purpose of various business decisions at different levels.

business.

iii. The business barometers are the indicators of future business trends and help

to forecast the speed of fluctuations.

iv. This method helps to find solution of various business problems such as

development of market, capital investment, exploration of new consumer market

etc.

Disadvantages:

ii. In most of the cases, the business barometers provide inaccurate, incomplete

and conclusive forecasting due to index numbers prepared on the basis of

incorrect and inadequate data.

iii. The business barometers are the indicators of past conditions and the

forecasting based on these conditions may be erroneous.

iv. Separate indices are calculated for individual industry and firm which are

entirely different from general indices.

Time series analysis is also used for the purpose of making business forecasting.

The forecasting through time series analysis is possible only when the business

data of various years are available which reflects a definite trend and seasonal

variation. By time series analysis the long term trend, secular trend, seasonal and

cyclical variations are ascertained, analyzed and separated from the data of

various years.

Merits

iii. Reliable results of forecasting are obtained as this method is based on

mathematical model.

Demerits

iii. This method can only be used when the data for several years are available.

1. Extrapolation

businessman find out the possible trend of demand of his goods and about their

future price trends also. The accuracy of extrapolation depends on two factors:

ii. Knowledge about the course of events relating to the problem under

consideration.

In extrapolation we assume that the variable will follow the established pattern of

growth. For the purpose of business forecasting it is to determine accurately the

appropriate trend curve and the values of its parameters. some of these curves are:

i. Arithmetic trend: The straight line arithmetic trend assumes that growth will be

a constant amount each year.

ii. Semi log trend: It assumes a constant percentage increase each year. As the

annual increment is constant in logarithm, this line will become a straight line

when drawn on semi log paper.

iii. Modified exponential curve: The curve is given by y = abx. This relationship is

referred to as an exponential function. It assumes that each increment of growth

will be a constant per cent of the previous one.

iv. Logistic curve:

asymptote. A curve of this type is well suited to describe the

growth of industries as they pass through early periods of

experimentation, rapid growth as the product is perfected and

economics of scale make possible price reductions.

Yc = abcx

scatter diagram of transformed variable.

Merits:

production.

events because it is a simple method.

mathematical method.

Demerits:

mathematical formulation.

2. Regression Analysis

The regression approach offers many valuable contribution to the solution of the

forecasting problem. It is the means by which we select from among the many

possible relationships between variables in a complex economy those which will

be useful for forecasting. Regression relationship may involve one predicted or

dependent and one independent variables simple regression, or it may involve

relationships between the variable to be forecast and several independent

variables under multiple regressions. Statistical techniques to estimate the

regression equations are often fairly complex and time-consuming but there are

many computer programs now available that estimate simple and multiple

regressions quickly.

gained in popularity for forecasting. The term econometrics refers to the

application of mathematical economic theory and statistical procedures to

economic data in order to verify economic theorems. Models take the form of a

set of simultaneous equations. The value of the constants in such equations are

supplied by a study of statistical time series, and a large number of equation may

be necessary to produce an adequate model.

At the present time, most short-term forecasting uses only statistical methods with

little qualitative information. However, in the years to come when most large

companies develop and refine econometric models of their major business, this

tool of forecasting will become more popular.

Merits:

i. Accurate and reliable results are obtained under this method because it is

a scientific method where computer is used.

ii. This method explains in detail and in quantitative terms the way in

which various aspects of the economy are interrelated.

Demerits:

ii. This method can be used only when adequate series of data is available.

activity.

4. Exponential Smoothing Method

compared to other methods. Exponential smoothing is a special kind of

weighted average and is found extremely useful in short-term forecasting

of inventories and sales.

context of the forecast, the relevance and availability of historical data, the

degree of accuracy desired, the time period for which forecasts are

required, the cost benefit of the forecast to the company, and the time

available for making the analysis.

The forecaster should use a technique that makes the best use of available

data. Furthermore, where a company wishes to forecast with reference to a

particular product, it must consider the stage of the products life cycle for

which he is making the forecasts.

There are a few theories that are followed while making business forecast. Some

of them are:

assumption that most of the business data have the lag and lead relationships i,e.,

changes in business are successive and not simultaneous. There is time-lag

between different movements.

inflationary pressured – the purchasing power of people goes up-the wholesale

prices, the retail prices starts rising. With the rise in retail prices the cost of living

goes up and with it there is a demand for increased wages. Thus, one factor i,e.,

more money in circulation, has affected various fields of economic activity not

simultaneously but successively.

Merits

accuracy.

understand.

iv. Government can use this technique for the purpose of economic

stability of the economy by exercising control over possible losses.

Demerits

ii. This method can not be regarded accurate because by using statistical

techniques the results can be up to the truth but not accurate one.

This theory is based on two assumptions: every action has a reaction, and the

magnitude of the original action influences the reaction. Thus if the price of rice

has gone up above a certain level in a certain period, there is likelihood that after

some time it will go down below the normal level. Thus, according to this theory

a certain level of business activity is normal or abnormal, conditions cannot

remain so for ever. Thus, we find four phases of a business cycle.

i.. Prosperity

ii. Decline

iv. Improvement

Merits

ii. By this theory more reliable results can be obtained because this theory

gives attention to action and reaction of event.

Demerits

The basic assumption of this theory is that history repeats itself and hence assume

that all economic and business events behave in a rhythmic order. According to

this theory, the speed and time of all business cycles are more or less same and by

using statistical and mathematical methods a trend is obtained which will

represent a long term tendency of growth or decline.

It is done on the basis of the assumption that the trend line denotes the normal

growth or decline of business events.

Merits

Demerits

i.The business events are not strictly periodic and prediction of business

cycle on the basis of statistical method is not satisfactory.

ii.Past conditions are given more weight-age than the present conditions

History repeats itself is the main foundation of this theory. Whatever happened in

the past under a set of circumstances is likely to happen in future also if

conditions are the same. A time series relating to the data in question is

thoroughly scrutinized and from it such period is selected in which conditions

were similar to those prevailing at the time of making the forecast but it is largely

dependent on past data.

Merits

i. It is an easy method

ii. As the future is forecasted on the basis of past business conditions, the

forecasting will be more reliable.

Demerits

method because the past and present conditions are rarely found to be

similar.

ii. It is very difficult to select the past period with the same business

conditions like present.

this method, the combined effects of various factors are not studied. The effect of

each factor is studied independently. Under this theory, forecasting is made on the

basis of analysis and interpretation of present conditions because the past events

have no relevance with present conditions.

Merits

Demerits

ii. Past facts are equally important for the purpose of forecasting but in this

method no weight-age in given to past.

iii. The forecasting made on the basis of this technique cannot be regarded

reliable.

Utility of Business Forecasting

Business forecasting helps the businessman and industrialists to form the policies

and plans related with their activities. On the basis of the forecasting the

businessman can forecast the demand of the product, price of the product,

condition of the market etc. the business decisions can also be reviewed on the

basis of business forecasting. The main advantages of business forecasting are:

out with the purpose of earning maximum profits, so by forecasting the future

price of the product and its demand the businessman can predetermine the

production cost, production and the level of stock to be determine. Thus, business

forecasting is regarded as the key of success of business.

management decisions because in present time the management has to take the

decision in the atmosphere of uncertainties. Also, the business forecasting

explains the future conditions and enables the management to select the best

alternative.

control the circulation of money, modify the economic, fiscal and monetary

policies to avoid the adverse effects of trade cycles. So, with the help of

forecasting the government can control the expected fluctuations in future.

iv. Basis for capital market: The business forecasting helps in estimating the

requirement of capital, position of stock exchange and the nature of investors etc.

v. Useful in controlling the business cycles: The trade cycles cause various

depressions in the business such as sudden change in price level, increase in the

risk of business, increase in unemployment etc. By adopting a systematic business

forecasting the businessman and government can handle and control the

depression of trade cycles.

vi. To achieve the goals: The business forecasting help to achieve the objective of

business through proper planning of business activities.

speculation, uneconomic activities and corruption can be controlled.

viii. Utility to society: With the help of business forecasting the entire society is

also benefited because the adverse effects of fluctuations in the conditions of

business are kept under control.

The business forecasting cannot be accurate due to various limitations which are

as follows:-

which are not sure to exist.

mathematical method. But the use of these methods cannot claim to be able to

make uncertain future certain.

simultaneously. In such a case the results of forecasting will be misleading.

iv. The forecasting cannot guarantee the elimination of errors and mistakes. The

managerial decision will be wrong if the forecasting is wrong.

v. Factors responsible for economic changes are often difficult to discover and to

measure. Hence business forecasting becomes an unnecessary exercise.

vii. The forecasting is made on the basis of past information and data and relies

that economic events are repeated under the same conditions. But there may be

circumstances where these conditions are not repeated

continuous attention.

Summary

forecasting, steps involved in forecasting and different methods available. Finally

we conducted it with utility of business forecasting.

MB0024-Unit-14

Introduction

A time series is a set of numerical values of a given variable listed at successive intervals

of time. That is, the data regarding the variable is listed in chronological order. Usually

the interval of time is taken as uniform.

Example: Yearly production of wheat in the country, hourly temperature of a city,

bimonthly electricity bills etc. Almost all the data like industrial production, agricultural

production, exports, imports, diary products can be arranged in chronological order.

Learning Objective 1

of time. That is, the data regarding the variable is listed in chronological order. Usually

the interval of time is taken as uniform.

bimonthly electricity bills etc. Almost all the data like industrial production, agricultural

production, exports, imports, diary products can be arranged in chronological order.

i. Study the forces that influence the variations in time series, and

ii. Study the behaviour of phenomenon over the given period of time.

For example, consider the sale of T.V sets (in thousands) by a producing company

Number sold (in thousands) 12 14 16 12 10 18

We would like to analyse the above data and give some trends about the sales. For

example, the company would like to know as to why the sales dropped in 1998 and 1999,

and then why the sales increased. That is, the company would like to analyse the various

forces that affect the sales.

There can be changes in the values of the variable recorder over different points of time

due to various forces. Analysing the effect of all such forces on the values of the variable

is generally known as the analysis of time series. Broadly there can be four types of

changes in the values of the variable as discussed below:

i. Changes which generally occur due to general tendency of the data to increase

or decrease.

ii. Changes which occur due to change in climate, weather conditions, festivals

etc.

iii. Changes which occur due to booms and depressions.

iv. Changes which occur due to some unpredictable forces like floods, famines,

earthquakes etc.

Learning Objective 2

The behaviour of a time series over periods of time is called the movement of the

time series. The time series is classified into the following four components:

regular long term growth or decline of the series. This movement can be

characterised by a trend curve. If this curve is a straight line, that is called a

trend line. If the variable is increasing over a long period of time, then it is

called an upward trend. If the variable is decreasing over a long period of

time, then it is called downward trend. If the variable moves upward or

downwards along a straight line then the trend is called a linear trend,

otherwise it is called a non-linear trend.

2. Seasonal Variations: Variations in a time series that are periodic in nature

and occur regularly over short periods of time during an year are called

seasonal variations. By definition, these variations are precise and can be

forecasted.

Examples:

i. The prices of vegetables drop down after rainy season or in winter months

and they go up during summer, every year.

ii. The prices of cooking oils reduce after the harvesting of oil seeds and go up

after some time.

rises and declines in the values of the variable are called cyclic variations.

Since these are long-term oscillations in the time series, the period of

oscillation is usually greater than one year. The oscillations are about a trend

curve or a trend line. The period of one cycle is the time-distance between two

successive peaks or two successive troughs.

2. Random Variations: These are called irregular movements. Movements

that occur usually in brief periods of time, without any pattern and are

unpredictable in nature are called irregular movements. These movements do

not have any regular period or time of occurrences. Example: The effects of

national strikes, floods, earthquakes etc. It is very difficult to study the

behaviour of such a time series.

Learning Objective 3

We shall be studying the following methods of measuring the trend of a time series:

trend curve. We plot the values of the variable against time on a graph paper

and join these points. The trend line is then fitted by inspecting the graph of

the time series. Fitting a trend line by this method is arbitrary. The trend line

is usually drawn such that the numbers of fluctuations on either side are

approximately the same. The trend line should be a smooth curve. The free

hand method has some disadvantages. They are:

ii. It cannot be used for any predictions of trends, as drawing the trend curve is

arbitrary.

Example: Find trend with the help of freehand curve method for the data given below:

Year 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001

Productio

n (in lakh 15 18 16 22 19 24 20 28 22 30 26

ton)

Solution:

1. Semi-Average Method:

The methods of fitting a linear trend with the help of semi average method are as follows:

i. The number of years in even: The data of the time series are divided into

two equal parts. The total of the items in each of the part is done and it is

then divided by the number of items to obtain arithmetic means of the two

parts. Each average is then centred in the period of time from which it has

been computed and plotted on the graph paper. A straight line is drawn

passing through these points. This is the required trend line.

ii. The number of years is odd: When the number of years is odd, the value

of the middle year is omitted to divide the time series into two equal parts.

Then the procedure (i) is followed.

A trend value of any future year may be predicted by multiplying the periodic increment

by the number of years into the future that is desired and adding the result to the best

trend value listed in the series.

Merits:

ii. The trend line can be extended on either side in order to obtain past or future

estimates.

iii. This is an objective method, as any one applying this method get the same

trend line.

Demerits:

i. The method of semi average assumes a straight line relationship between the

plotted points, regardless of the fact whether such relationship exists or not.

ii. This method has an in built limitation of arithmetic mean. This method is not

suitable is case of very low or very large extreme values.

This method is used for smoothing the time series. That is, it smoothens the

fluctuations of the data by the method of moving averages.

method, we use the following method:

iii. Compute moving totals according to the length of the period of moving

average.

If the length of the period of moving average is 3 i,e., 3-yearly moving average is

to be calculated, compute moving totals as follows:

a + b + c, b + c + d, c + d + e, d + e + f…..

a + b + c + d + e, b + c + d + e + f, c + d + e + f + g…..

Placing the moving totals at the centre of the time span from which they are

computed.

iv. Compute moving averages by moving totals in step (3) by the length of the

period of moving average and place them at the centre of the time span from

which the moving totals are computed. These moving averages are also called the

trend values.

By plotting these trend values (if desired) one can obtain the trend curve with the help of

which we can determine the trend whether it is increasing or decreasing.

If needed, one can also compute short-term fluctuations by subtracting the trend values

from the actual values.

Illustrative Example:

Year 1998 1989 1990 1991 1992 1993 1994 1995 1996 1997

Production (in

15 18 16 22 19 24 20 28 22 30

lakh ton)

Year

Y Tonnes) moving totals totals Ye fluctuations (Y – Yc)

1988 21 - - -

1989 22 66 22.00 0

1990 23 70 23.33 - 0.33

1991 25 72 24.00 1.00

1992 24 71 23.67 0.33

1993 22 71 23.67 - 1.67

1994 25 73 24.33 0.67

1995 27 79 26.33 0.67

1996 26 - - -

even (4years etc) we compute the moving averages by using the following steps:

ii. Obtain the length of the period of moving average. Let the length of the

moving averages period be 4-years.

iii. Compute 4 yearly moving totals and place them at the centre of time span.

The four – yearly moving totals are computed as follows:

a + b + c + d, b + c + d + e, c + d + e + f,

iv. Compute 4 – yearly moving average and place them at the centre of the

time span. Note that this placement is inconvenient, because the moving

average so placed would not coincide with original time period.

v. Take two – period moving average of moving averages and place them at

the middle of the periods. This process is called centring of moving averages.

ii. This method is objective in the sense that any body working on a problem with

this method will get the same results.

iii. This method is used for determining seasonal, cyclic and irregular variations

besides the trend values.

iv. This method is flexible enough to add more figures to the data because the

entire calculations are not changed.

fluctuations in the data, such fluctuations are automatically eliminated.

Limitations:

i. There is no functional relationship between the values and the time. Thus, this

method is not helpful in forecasting and predicting the values on the basis of time.

ii. There are no trend values for some year in the beginning and some in the end.

For example, for 5 – yearly moving average there will be no trend values for the

first two years and the last three years.

iii. In case of non – linear trend the values obtained by this method are biased in

one or the other direction.

iv. The selection of the period of moving average is a difficult task. Therefore

great care has to be taken in selecting the period, particularly, when there is no

business cycle during that time.

Under this method the trend curve is determined by fitting a mathematical equation. This

method is more accurate and precise and can be used even for forecasting. We can fit

either a straight line or a parabolic curve from the given data by this method. Let y be the

actual values of y and yc be the computed values of y for a given value of x.

Let y = a + bx be a straight line to be fitted for trend. To find the values of a and b such

that the sum of squares of differences of the actual and computed values of y is least, i.e

∑ (y – yc)2 is least, where the condition ∑ (y – yc) = 0 is satisfied, is known as method of

least squares. The line obtained by the method is known as the ‘line of best fit.’

For a given time series data, to find a linear trend, the values of a and b are obtained by

the normal equations.

Here a is intercept of the line on they y – axis and b is the slope of the line, b is

also known as growth rate (if b > 0) or decline rate (if b<0), b gives the change in

the value of y, for per unit change in the value of x.

Direct method

i. Convert the years into natural numbers ( 1, 2, 3……) and denote by x and find

∑ x.

iii. Multiply the x – values with corresponding y – values and obtain ∑ xy.

v. Put these values in the two normal equations and solve for a and b.

vi. Substitute these values of a and b in y = a + bx and then find trend values for

various values of x.

Measure the variables x from any point of time in origin as the first year, but the

calculations are simplified when the mid – point in time is taken as origin so that ∑ x = 0,

when ∑ x = 0, then normal equations reduce to:

∑ xy = b ∑ x2 therefore b = ∑ x / Σx2

Merits

ii. This method gives the trend values for the entire time period.

iii. This method can be used to forecast future trend because trend line establishes a

functional relationship between the value and the time.

Demerits

ii. If even a single item is added to the series a new equation has to be formed.

iii. Future forecasts made by this method are based only on trend values. Seasonal,

cyclical or irregular variations are ignored.

Fitting a Parabolic Curve or Non – linear Curve by the Method of Least Squares.

When the time series data do not confirm with the linear trend then we obtain Non –

linear trend. For this we use the equation of the form.

solving the normal equations:

If we can change the origin at a suitable point such that Σ x = 0, then the normal

equations reduce to:

Mathematical Models for Time series

The following are the two models commonly used for the decomposition of a time series

into its components.

1. Additive Model: This model assumes that the observed value is the sum of

four components of time series, i,e.,

Y = T + S + C + 1,

Cyclical component, I = irregular component,

The additive model for decomposition of time series assumes that all the four

components of the time series operate independently of one another. It also

assumes that the behaviour of components is of an additive character. It is to

be noted that only absolute values are added or deducted from the trend value

to arrive at the observed value.

obtained by multiplying the trend (T) by the rates of three other components,

i,e,

Y = T x S x C x 1.

different causes are not necessarily independent and they can affect one

another. It also assumes that the behaviour of components is of multiplicative

character. It may be noted that except the value of trend all the other values on

the right hand side are rates or index numbers.

conform to the multiplication model. In practice, additive model is rarely

used.

It is necessary to make certain adjustments in the available data. Some important

adjustments are:

i. Time Variation : When data are available on monthly basis, the effect of

time variation needs to be adjusted because all months of the year do not

have the same number of days. This adjustment of time variation is done by

dividing each monthly total by daily average, it is then multiplied by 365 /

12 which is the average number of days in a month.

necessary when a variable is affected by change in population. If we are

studying National Income figures such adjustment is necessary. In this case,

adjustment is to divide the income by the number of persons concerned.

Then we can have per capita income figures.

wherever we have real value changes. Current values are to be deflated by

the ratio of current prices to base year prices.

iv. Comparability: In order to have valid conclusion the data which are

being analysed should be comparable. When we are dealing with the

analysis of time series it involves the data relating to past which must be

homogeneous and comparable. When we are dealing with the analysis of

time series it involves the data relating to past which must be homogenous

and comparable. Therefore, effects should be there to make the data as

homogeneous and comparable as possible.

possible the effect of trend, cyclical variations and irregular fluctuations on the time

series. The main methods of measuring seasonal variations are:

In this method we will use following steps: i) The time series is arranged by years and

months or quarter. ii) Totals of each month or quarter over all the years are obtained. iii)

The average for each month or quarter is obtained. The average may be mean or median.

In general, we take mean if not specified otherwise. iv) Taking the average of monthly or

quarterly average equal to 100, seasonal index for each month or quarter is calculated by

the following formula:

Symbolically

L1=(S1/S)X100

j = 1, 2, 3, 4……..k

Merits

ii. This method is useful where no definite, trend exists in the time series.

Demerits / Limitations

i. Most economic time – series have trends and therefore, the seasonal index

computed by this method is really an index of trends and seasons.

ii. The simple averages method of isolating seasonal fluctuations in time – series is

based on the assumption that the series contains only the seasonal and irregular

fluctuations.

iii. This method does not give a true reflection of the normal seasonal variation

because it is obtained from the original data which are affected by not only seasonal

movements but also by remaining three components.

iv. The effects of cycles of the original data are not eliminated by the process of

averaging.

This method is also known as Percentage of Moving Average Method. The steps

involved in the computation of seasonal indices by this method are as follows:

i. The moving averages of the data are computed. If the data are monthly then

12 – monthly moving average, if they are quarterly, then 4 – quarterly moving

averages will be computed. In both the cases time periods of moving averages are

even, hence these moving averages are to be centred.

ii. Under additive model, from each original value, the corresponding moving

average is deducted to find out short time fluctuations:

iii. Y–T=S+C+1

iv. By preparing a separate table, monthly (or quarterly) short time fluctuations

are added for each month (or quarter) over all the years and their average is

obtained. these averages are known as seasonal variations for each month or

quarter.

month or quarter is deducted form the short – time fluctuations.

monthly value and the result is multiplied by 100. These percentages are known

Link Relatives of the seasonal values. Thus;

ii. The mean of the Link Relatives for each season is computed over all the

years. Median can also be taken instead of mean of the Link Relatives.

iii. These average link, relatives are converted into chain relatives. The chain

relative of first is taken as 100.

= (Average Link Relative of current * Chain Relative of previous year)/100

iv. The second chain relative of first is computed on the basis of the chain

relative for the last:

Chain relative of the first quarter = (Average Link Relative of the first x Chain

Relative of the last)/100

This chain relative may or may not be 100. It is not equal to 100 due to secular

trend. If it is 100 go to step (vi), if it is not 100 go to step (v) and then step (vi).

v. Compute the difference d between the new chain relatives of first obtained in

step (iv) and chain relative assumed as 100. d is divided by the number of seasons

and the resulting figure is multiplied by 1, 2, 3 and the product is deducted

respectively from the chain relatives of 2nd, 3rd, and 4th quarters. These are called

corrected relatives.

vi.The chain relatives obtained is step (iv), if correction is not necessary for the

corrected chain relatives obtained in step are expressed as percentages of the

average to have adjusted chain relatives.

ii. To find ratio to trend, divide the original data by the corresponding trend

values and multiply these ratios by 100,i,e

iii. Calculate the Arithmetic Mean of the Trend Ratios obtained in step (ii).

iv. Finally all the trend ratios will be converted into seasonal indices. For this

add all averages obtained in (iii) and find their General Average. Seasonal indices

are calculated by using the following formula:

Learning Objective 4

time period t, we forecast the value of the series to be equal to the mean of the

series I,e,

In this method the trend effect and cyclic effects do not come into account.

1. Naïve Forecast: In this method we forecast the value, for the time period t,

to be equal to the actual value observed in the previous period, i,e, time period

(t-1). This is given as

Y t = yt – ν

from the value of t; a and b are constants. This method is based on the least

squares method where a linear relationship is to be obtained between time and

the response value x by the above formula.

2. Non-Linear Trend Forecast: In this method a non-linear relationship

between the time and the response value has been found by the method of

least squares. The value of forecast yt for the time period t, can be yt = a + bx

+ cx2

Where x-value will be calculated from the value of t and the constants a.

Learning Objective 5

i. Comparative study of the behaviour of the variable over different periods of

time can be done. The variable may be export figures, quantity of industrial

production etc:

ii. Forecasting can be done using the time series. By studying the variations and

other behaviour of the variables over a sufficiently long period of time, it may be

possible to forecast the future behaviour of the variables. However, such a

forecast has meaning only if the period of forecast is a normal period. For

example, various five-year plans by the Government of India are formulated by

studying the time series and forecasting.

iii. Study of the time series helps in analysing the post behaviour of the

variables. This helps in identifying the various forces that effect its behaviour.

Summary

In this unit we studied about the business forecasting. The different step involved in

forecasting is discussed in a simple manner. The concept of time series analysis is

discussed next with good examples. Action and reaction theory is explained with its

merits and demerits in a simple manner. Lastly in this unit we discuss about the method

of least squares with merits and demerits discussed in detail.

MB0024-Unit-15

Introduction

We know the most values change and therefore may want to know-how much changes

has taken place over a period of time. For example, we may want to know how much the

prices of different times essential to a household have increased or decreased so that

necessary adjustments can be made in the monthly budget. However, while price of a few

items may have increased, others may have decreased over a given period of time.

Consequently, in all such situations, an average measure needs to be defined to compare

such difference over a time period. Index numbers are yardsticks for describing such

differences.

differences in a variable or a group of related variables, usually expressed in percentage

form. These differences may have to do with the physical quantities of the goods, the

prices of the commodities, or such concepts as’ efficiency’ “intelligent’ or beauty’. The

comparison may be between the periods of time, between places, between like categories

etc. we may have index numbers comparing the cost of living at different times or in

different localities or countries, the physical volume of production in different years, or

efficiency or different government offices. However, we confine most of our attention to

the construction of index numbers measuring changes over time.

Learning Objective 1

phenomenon as compared to the level of the same phenomenon at some standard period.

In other words an index number is a number which is used as a device for comparison

between the price, quantity or value of a group of articles in different situations, e.g. at a

certain place or a period of time and that of another place or period of time. When a

comparison is in respect of prices, it is called an index number of price, when in respect

of physical quantities; it is named as index number of quantities. Other index numbers are

defined in the similar manner. The index numbers are mean for comparison of variations

arising out of the difference in situations, e.g change of time or change of place.

Learning Objective 2

Relative

The value of a variable in a given year (or place) divided by the value of the same

variable in a specified year (or place) is called a relative and is generally, expressed in

percentage.

a.Price Relative: The price of commodity in a given year expressed as a percentage of the

price of the same commodity in a specified year is called price relative.

Suppose the price of a commodity in India in 2001 was Rs.95 per kg and in 2000 it was

Rs.80 per kg

Then the price relative for 2001, (using 2000 as base) is: 95 / 80 x 100 = 118.75%

b.Production Relative: If the wheat production in India in 2002 was 5,82,000 metric tons

and in 2004 it was 6,96,000 metric tons, then assuming the production of 2002 as 100, the

production relative for 2004 is equal to (696000/582000)x100=119.6%

c.Quantity Relative: The quantity (q1) of a commodity consumed in a given year

expressed as a percentage of the quantity (q0) of the same commodity consumed in a

specified year is called Quantity Relative.

d.Value Relative: If p1 and q1 are the price and quantity respectively for a commodity in a

given and p0 and q0 are the specified price and quantity respectively of the same

commodity, in a specified year, then V1 = p1q1 is the value of given year and V0 = p0q0 is

the value of the specified year.

The ratio V1 / V0 x 100 = p1q1 / p0q0 x 100 is called the value relative of the specified year

with respect to the given year.

The overall change in price, production, quantity or value etc. is represented by these

typical summaries which are known as relatives.

1. Based on Variables.

3. Cost of living index number: Where we use retail prices.

4. Wholesale price index number: Where we use wholesale prices.

5. Based on Weights

6. Simple (unweighted) index number.

7. Weighted index number.

8. When a number of commodities is more than one, then we obtain a single

(combined) index number. This can be done in four ways:

iii.Simple aggregate

iv.Weighted aggregate.

In the computation of an index number we require two years (or places). The given year

whose values are to be compared is called a current year (or current period) and the

specified year whose values are taken as standard (say 100) is called a Base year {Base

Period}. For example, if the prices of 2005, are compared with the prices 0f 2004, then

2005 is the current year and 2004 is the base year. The index number of 2005 based on

2004, in general, denoted by O01or P01, where 0 stands for 2004, and 1 stands for 2005.

production is increased, prices are down etc, in the numbers.

as to show the extent or relative change where the value of base is assumed to be

100 but the sign of percentage (%) is not used.

iii.Relative measure: Index numbers measure changes which are not capable of

direct measurement.

general, a weighted average. It is a special type of average, because whereas in a

simple average, the data are homogenous having the same unit of measurement,

they average variables having different units of measurement.

v.Basis of Comparison: Index numbers by their very nature are comparative. They

compare changes overtime or between places or like categories.

Learning Objective 3

To follow the steps involved in the construction of index numbers many problems are

encountered which are to be discussed carefully:

1. Purpose of Index Number: The steps which are taken in the construction of index

numbers generally depend on the purpose of the index number. Hence the purpose

of an index numbers must be defined clearly and precisely. For example, the

purpose of the general index number of wholesale price index number is to know

the general price level, while that of consumer price index number is to give an

idea of the effect of the change in retail prices on the cost of living of classes of

people.

1. Selection of Base Period: The base period of an index number is the period of

time against which the comparisons are made. There are three types of based

period.

iii.Chain base

While selecting the base a decision has to be made so as to whether we have fixed

base or chain base in a fixed base (a single period):

iv.The base period must be a normal period. By normal period we mean that

period which is free from all sorts of abnormalities or random causes such as

financial crisis, floods, famines, earth quakes, strikes of labourers, wars etc.

v.The base period should be a period for which reliable figures are available.

When it is difficult to choose just one single period as the normal, then a better

choice will be an average of several periods.

If the comparisons are required form year to year a system of chain base is used.

In this method, there is 10 fixed base for comparing the values of subsequent

years, but the value of each year is compared with the value of the preceding year.

2. Selection of Commodities:

include all commodities. The purpose of the index number is to help in deciding

the number of commodities.

must be made in such a way that:

i.It represents the real tastes, habits and the customs of the people,

ii.It should be of a standard quality and there must be no significant

variation in the quality,

we have to consider the following points:

of the different commodities included in the construction of index

numbers. There are two methods of assigning weights:

commodity under study are used. Such weights are called implicit weights.

ii.Explicit method: In this method, the weights are laid down on the basis

of one outward evidence of importance of commodities. One fo the

problems in the selection of appropriate weight is to decide this evidence.

Another problem with regard to the system of weighting is whether

weights should be fixed or fluctuating.

3. Selection of the Average: To find composite index number we can use any

average such as arithmetic mean, geometric mean, harmonic mean, median and

mode. The use of an average depends on the relative merits and demerits of the

various averages. The average may be weighted or unweighted.

4. Selection of Suitable Formula: There are various formulae for computing index

numbers so the selection of a suitable formula also possess some problem. A

particular formula is suitable in a particular situation.

Learning Objective 4

The various methods of constructing index numbers can be classified in two

groups.

In unweighted index numbers each item is supposed to have the same weight but

in weighted index numbers the weights are assigned to various items in

accordance with their importance.

aggregative method we proceed as follows:

i. Add the prices of all commodities in the current year, i,e., find

∑p1

ii. Add the prices of all commodities in the base year, i,e., ∑p0

iii. Divide the total of current year prices by the total of base year

prices and multiply the quotient by 100, i,e

iv.Here I01is the simple price index number of current year (1)

based on based year (0).

Merits: This is the simplest method of constructing index numbers because it is

simple to understand and requires simple calculations.

Demerits:

commodities are quoted in different units.

ii. Since weights are not used, this method does not give any

consideration to the relative importance of commodities.

or low values.

Example 1: Find the simple aggregative price index from the following data:

2000 2004

A One kg. 10 15

B One kg. 40 30

C One dozen 10 12

D One litre 5 13

Solution:

Therefore

This implies that the prices had increased by 7.7% in year 2004 as compare to the

year 2000.

we proceed as follows:

year/price of base year)X100

R=(p1/p0)X100

the price relatives obtain in (i) and denote it by L01

Merits / Advantages:

into price relative.

iii. It gives equal importance to all items and extreme items do not unduly

affect the index number.

iv. The index number calculated by this method satisfies the unit test.

Demerits / Limitations

i. As it is an unweighted average the importance of all items is assumed to

be the same.

ii. The index number constructed by this method does not satisfy all the

criterion laid down for an ideal index.

iii. The index number is unduly influenced by high or low prices when

arithmetic mean is used.

To meet the weakness of the simple or unweighted method, we weight the price

of each commodity by a suitable factor often take as the quantity or the volume of

the commodity sold during the base year. In other words, in this method

appropriate weights are assigned to various commodities to reflect their relative

importance in the group. The weight can be production figures, consumption

figures or distributive figures. For the construction of the price index number

quantity weights are used. If w is the weight attached to a commodity, then the

price index is given by

the weights are assigned to various items and the weighted aggregate of the prices

are obtained. Weights are assigned in various ways and the weighted aggregates

are used in different ways for the construction of index numbers. Some of the

important methods of constructing weighted Aggregative Index Numbers are

given below:

Laspeyre’s Price Index: Laspeyre’s method is based on fixed weights of the base

year. Base year’s quantities are used as weights. The formula given by Laspeyre

is given below:

Laspeyre’s Price Index:

This index number has an upward bias i,e when prices increase, there is a

tendency to reduce the consumption of higher priced goods. The index

number is very widely used in practical work.

Current year’s quantities are used as weights. The formula given by Paasche is

given below:

This index number has downward bias. This formula is not used frequently in

practice where the number of commodities in large.

Paasche’s method. If we find out the arithmetic average of Laspeyre’s index and

the Paasche’s index, we get the index suggested by Dorbish and Bowley. This

index number takes into account both the base year as well as the current year

weights.

Merits

ii.This formula takes into account both current years as well as base year

prices and quantities.

iii.It satisfies both ‘time several test’ as well as the ‘factor reversal test’.

This is why it is called an ideal index number.

Demerits:

laborious.

iii. It requires the prices and quantities for base year and current yar.

The quantity index numbers measure the average storage in quantities and enable

us to compare changes in physical quantity of goods produced or sold. These

index numbers can also be simple or weighted. Weights in quantity index number

in price. Therefore quantity index numbers can be easily obtained from price

index numbers just by interchanging p’s and q’s in the above formulae.

The value index numbers are very easy to calculate. Value is the product of price

and quantity. A simple value index number is equal to the value of the current

year divided by the value of the base year. If this value is multiplied by 100 we

get the value index number. The required formula is:

Such Index numbers are not weighted because they do not take into account either

the price or the quantity. These index numbers are not very popular because the

situation revealed by price and quantities are not fully revealed by the values.

units. Expect simple aggregative index all the others satisfies this

test.

3. Time Reversal Test: This test requires the formula for

calculating the index number should be such that it will give the

same ratio between one period of comparison and the other.

Index, simple geometric mean of price relatives, weighted geometric

mean of price relatives and Marshall-Edgeworth Index number.

interchange of price and quantity without giving inconsistent results.

that if an index is constructed for the year ‘a’ on base year ‘b’, and for the

year b on the base year c we should get the same result as if we calculated

directly for the year a on the base year c without going through b.

Symbolical P01 x P12 x P20 = 1

aggregate methods.

The ‘cost of living index’, also known as “consumer price index’ or ‘cost of living

price index’ is the country’s principal measure of price change. It measures

average change over time in the prices paid by the consumer of specific baskets of

goods and services.

The consumer price index numbers are designed to measure the average change in

the price index numbers are designed to measure the average change in the price

paid by the ultimate consumers for specified quantities of goods and services over

a period of time. The consumer price index helps us in determining the effect of

rise and fall in prices on different classes of consumers living in different areas.

commodities in different proportions. The consumer price index helps us in

determining the effect of size and fall in price index helps us in determining the

effect of rise and fall in prices on different classes of consumers living in different

areas. The consumer price index number is significant because the demand of a

higher wage is based on the cost of living index and the wages and salaries in

most nations are adjusted according to this index number. The cost of living index

does not measure the actual cost of living nor the fluctuations in the cost of living

due to causes other than the change in price level but its object is to find out how

much the consumers of a particular class have to pay for a certain quantity of

goods and services.

real income etc.

taxation and general economic policies.

iii. Market price for a particular kind of goods and services are

analysed by consumer price index.

iv. The salaries and wages are fixed on the basis of consumer price

index. So, it is very helpful to revise wage of dearness allowance.

Assumptions: Cost of living index number is based on same assumptions which

are as follows:

i.Similar needs: The needs of the people for which this index number

is constructed are same.

ii.Same goods: The goods consumed in the base year and the current

year unchanged.

of goods consumed will remain same in the base year and current year.

different places are same and they do not change frequently.

v.True on the average: Cost of living index numbers are true on the

average.

living index number represent the consumption of the class of people.

These are three methods for constructing consumer price index number:

base year quantities are taken as weights (w = Q0)

P

ii.Family budget method: [or the method of weighted relatives

where weights are the value (P0Q0) in the base year often denoted

by V]

sane as (i)

weights

Then

Learning Objective 5

There is no doubt that the technique of index numbers is a very useful tool. But

these are certain limitations of index numbers which should be borne in mind.

• Difficulties in the construction of index numbers – Due to selection of base year,

items, changes in habits and selection of average.

• Sampling errors

• Index numbers can also be manipulated

• Limited application – An index number constructed for one purpose cannot be

used for other purposes.

• Lack of adequate and accurate data

Learning Objective 6

The primary purpose of index numbers is to measure relative temporal or cross-sectional

changes in a variable or a group of related variables which are not capable of being

directly measured. The greatest purpose of index numbers has been to measure and

compare the changes in prices and purchasing power of money which have received great

attention of economists for many years.

Now-a-days, index number is not only used for measuring price changes alone. The

factors like wages, employment, production, trade, demand, supply, business condition,

industrial activity, financial problems etc., are also studied through this statistical device.

As a Barometer measures the pressure of atmosphere or gases so the index numbers

measure the pressure of economic behaviour, and thus the index numbers are called

economic Barometers.

• Comparative Study

• Simplifies data

• Provide guidelines to economic policy and in formulating decisions

• Measures purchasing power of money

• Change in cost of living

• National income

• It is used as control by government

• Reveal trends and tendencies

• Useful in deflating

• Comparative study is made possible

• Universal utility

Summary

In this unit we studied about the concept of index numbers, and classification of index

numbers into different types. The different index numbers formally available, the utility

and importance of index numbers are explained in a simple way.

- HistogramUploaded bymoresubscriptions
- statistics for managementUploaded bySreenivas Kodamasimham
- MB0024Uploaded byyfisseha
- Statistical MethodsUploaded byGuruKPO
- Survey the CustomerUploaded byadbwaterforall
- CEO Survey Full Report 2011Uploaded bypreethilister
- MB0040-Statistics for Management-Answer KeysUploaded byYash koradia
- research_methodologyUploaded byMike Dell
- Statistics for Managers using Microsoft Excel6th Edition Chapter2.pptxUploaded byPreeti Arora
- The Ins and Outs of Histograms with ExcelUploaded bySpider Financial
- MB0040 Statistics for Management Set1Uploaded byAyaz Ansari
- Business Statistics- A Decision-Making Approach...SubhajyotiUploaded bysubhajyotimajumder
- Assignment-Statistics.docUploaded byAdnan Murtovic
- IGNOU MBA Note on Statistics for ManagementUploaded byravvig
- Business Statistics_ A First Course.pdfUploaded byRamesh Chandra
- Research about Badjaos in Batangas CityUploaded byCheeneTorino
- Customer SatisfactionUploaded byAnnapurna Vinjamuri
- Note TakingUploaded byDharshan
- Introduction.docxUploaded byJoey Marie Arabia Estanislao
- ESS QAF - Completo - Em InglêsUploaded byRodrigo Lustoza Malanquini
- CHAPTER 1-3Uploaded byKeithryn Lee Ortiz
- team2 wolfpack project2criticalanalysis finalUploaded byapi-432604904
- RW-FINAL-GERIO.docUploaded byChristian Macaballug
- Final PaperUploaded byClaudine Perez de Castro
- ESTI-Pre-Bacc Maritime Grade 11 StudentsUploaded byARNOLD
- Statistics Tutorial LearningUploaded bysaket14
- Prepositions in Academic WritingUploaded byNathan Souza
- Hillsborough County Fire Rescue employee surveyUploaded byAnonymous YhHeJs
- Chapter 2Uploaded byAeryll Jasmin
- critique paper no.1.docxUploaded byERL GODFRE P. NICOL

- THE IMPACT OF THE GLOBAL FINANCIAL CRISIS ONUploaded bySrividya Natarajan
- Tide In IndiaUploaded byankitsethia2003
- Sample Examination PaperUploaded byJack Chung
- PacLife ILAPA BrochureUploaded bylshipley61
- Goals ExerciseUploaded byOana Florea
- ICI Raw DataUploaded byTaqi Zahoor
- CDB Standard Bidding Document for Procurement of Small WorksUploaded byabj7
- As 02Uploaded byapi-3705645
- Sayer - For a Critical Cultural Political Economy (Lancaster)Uploaded byquet1m
- FMP - Purchase RequisitionUploaded bySylvia Johnston
- Debt SecuritizationUploaded byCharu Modi
- Airport RetailingUploaded byagarwal_manvigr8
- Shopping: The Emporium Strikes Back | the EconomistUploaded bymotteboss
- IAS 12 Income TaxesUploaded byFritz Mainar
- 08.IMG7 MonopolyUploaded byShashank Chaturvedi
- MIS Strategic Plan for Tesla MotorsUploaded byRintu Adhikary
- 18Chap010aUploaded bytisha10rahman
- Socialist RenewalUploaded byRichard Ramirez
- 4-Why Old Tools Won't Work in the New Knowledge EconomyUploaded byecremy
- Deaton Analysis of Household SurveyUploaded byFrancisco Carrillo
- Macro Assignment 1Uploaded byDavid Amiel
- Chap07.docUploaded byHuỳnh Châu
- Current Liabilities and BondsUploaded bycoffeedance
- SS12 Answered Problem SetsUploaded byMark Dones
- Artcles > Agricultural MarketingUploaded byapi-3833893
- Essay on Load Shedding in PakistanUploaded byHassam Subhani
- Booking ConfirmationUploaded byChristopher Tapia Argudo
- SIMPLE STOCK SELECTION SYSTEM THAT MAKES MONEYUploaded bykelanio2002780
- Jep Nk June2018Uploaded bysignalhuckster
- Marketing EthicsUploaded byBhupesh B Yadav