You are on page 1of 102

ABF 102

ACeL

STATISTICS

Vol-1

In the modern world of computers and information technology, the importance of statistics is very well recogonised by all the disciplines. Statistics has originated as a science of statehood and found applications slowly and steadily in Agriculture, Economics, Commerce, Biology, Medicine, Industry, planning, education and so on. As on date there is no other human walk of life, where statistics cannot be applied.

Amity

University

Preface

The importance of Business Statistics, as a field of study and practice, is being increasingly realized in schools, colleges, universities, commercial and industrial organizations both in India and abroad. It is a technical and practical subject and learning of it means familiarizing oneself with many new terms and concepts. As the Student‟s Study Material is intended to serve the beginners in the field, I have given it the quality of simplicity. This Study Material is intended to serve as a Study Material for students of BFIA course of Amity University. This Study Material of “Business Statistics”, is „student oriented‟ and written in teach yourself style.

The primary objective of this study material is to facilitate clear understanding of the subject of Business Statistics. This Material contains a wide range of theoretical and practical questions varying in content, length and complexity. Most of the illustrations and exercise problems have been taken from the various university examinations. This material contains a sufficiently large number of illustrations to assist better grasp and understanding of the subject. The reader will find perfect accuracy with regard to formulae and answers of the exercise questions. For the convenience of the students I have also included multiple questions and case study in this Study Material for better understanding of the subject.

I hope that this Material will prove useful to both students and teachers. The

contents of this Study Material are divided into eight chapters covering various aspects of the syllabus of BFIA and other related courses. At the end of this Material three assignments have been provided which are related with the subject

matter.

I have taken considerable amount of help from various literatures, journals

and medias. I express my gratitude to all those personalities who have devoted their life to knowledge specially Statistics, from whom I could learn and on the

basis of those learnings now, I am trying to deliver my knowledge to others through this material. It is by God‟s loving grace that he brought me in to this world and blessed me with loving and caring parents, my respected father Mr. Manohar Lal Arora and my loving mother Mrs. Kamla Arora, who have supported me in this Study Material. Words may not be enough for me to express my deep sense of gratitude and indebtedness to Dr. Shipra Maitra, Director (Amity College of Commerce & Finance, Amity University, Noida) for the benevolent guidance, constructive criticism and constant encouragement throughout the period I have been involved in this Study Material. I am thankful to my beloved wife Mrs. Deepti Arora, without whose constant encouragement, advice and material sacrifice, this achievement would have been a far of dream.

 Preface 2 CHAPTER ONE : INTRODUCTION TO STATISTICS 8 1.1 Introduction 8 1.2 Meaning of Statistics 9 1.3 Origin and Growth of Statistics: 10 1.4 Definitions : 10 1.4.1 Definition by Florence Nightingale 11 1.4.2 Definitions by A.L. Bowley: 11 1.4.3 Definition by Croxton and Cowden: 11 1.4.4 Definition by Horace Secrist: 12 1.4.5 Definition by Professor Secrit : 12 1.4.6 Definition by Croxton and Cowden : 13 1.5 Characteristics of Statistics: 13 1.5.1 Statistics are aggregate of facts : 13 1.5.2 Statistics are numerically expressed : 13 1.5.3 Statistics are effected to a marked extent by multiplicity of causes : 14 1.5.4 Statistics are collected in a systematic order : 14 1.5.5 Statistics must be collected for a predetermined purpose : 14 1.5.6 Statistics should be placed in relation to each other : 14 1.6 Functions of Statistics: 14 1.6.1 Condensation: 14 1.6.2 Comparison: 15 1.6.3 Forecasting: 15 1.6.4 Estimation: 15 1.6.5 Tests of Hypothesis: 16 1.7 Scope of Statistics: 16 1.7.1 Statistics and Industry: 16 1.7.2 Statistics and Commerce: 17 1.7.3 Statistics and Agriculture: 17 1.7.4 Statistics and Economics: 18

1.7.6

Statistics and Planning:

19

 1.7.7 Statistics and Medicine: 19 1.7.8 Statistics and Modern applications: 19 1.8 Limitations of statistics: 20 1.8.1 Statistics is not suitable to the study of qualitative phenomenon: 20 1.8.2 Statistics does not study individuals: 20 1.8.3 Statistical laws are not exact: 20 1.8.4 Statistics table may be misused: 21 1.8.5 Statistics is only, one of the methods of studying a problem: 21 1.9 Distrust Of Statistics 21 1.10 Uses of Statistics : 22 1.10.1 To present the data in a concise and definite form : 22 1.10.2 To make it easy to understand complex and large data : 22 1.10.3 For comparison : 22 1.10.4 In forming policies : 22 1.10.5 Enlarging individual experiences : 22 1.10.6 In measuring the magnitude of a phenomenon: 22 1.11 Types of Statistics 23 1.12 Common Mistakes Committed In Interpretation of Statistics 23 Chapter One: End Chapter Quizzes 25 CHAPTER TWO: PRIMARY AND SECONDARY DATA 27 2.1 Primary Data 27 2.2 Sources of Primary Data 27 2.2.1 Direct personal investigations : 27 2.2.2 Indirect oral investigations : 28 2.2.3 Information through correspondence : 28 2.2.4 Mailed questionnaire method : 28 2.2.5 Schedule to be filled in by the enumerator : 28 2.3 Secondary Data 28 2.4 The nature of secondary sources of information 29 2.5 Sources of Secondary data 31

2.5.2

External sources of secondary information

32

 2.5.3 Examples of Sources of External Secondary Data 35 2.6 The problems of secondary sources 36 2.7 Difference between Primary & Secondary Data 39 Chapter Two: End Chapter Quizzes 41 CHAPTER THREE : MEASURES OF DISPERSION 44 3.1 Meaning 44 3.2 Definitions : 44 3.3 Types of Dispersion : 44 3.3.1 Absolute Dispersion : 44 3.3.2 Relative Dispersion : 44 3.4 Features of an ideal measure of dispersion 46 3.5 Methods of measuring Dispersion 46 3.5.1 Range 46 3.5.2 Quartile Deviations 49 3.5.3 Mean Deviation 52 3.5.4 Standard Deviation (S. D.) 57 3.5.5 Co-efficient Of Variation ( C. V. ) 58 Chapter Three:- End Chapter Quizzes 62 CHAPTER FOUR:-MEASURES OF SKEWNESS 65 4.1 Skewness 65 4.2 Definitions : 65 4.3 Difference between Skewness and Dispersion 67 4.4 Tests of Skewness 67 4.5 Methods of measurement of Skewness 67 Chapter Four: End Chapter Quizzes 73 CHAPTER FIVE: CORRELATION 76 5.1 Introduction 76 5.2 Definitions : 77 5.3 Coefficient of Correlation 78 5.4 Types of Correlation 82
 5.6 Techniques in Determining Correlation 84 5.6.1 Rating Scales 85 5.7 Methods of Determining Correlation 89 5.7.3 Spearman’s Rank Correlation Coefficient 94 Chapter Five: End Chapter Quizzes 100

CHAPTER ONE : INTRODUCTION TO STATISTICS

1.1 Introduction In the modern world of computers and information technology, the

importance of statistics is very well recogonised by all the disciplines. Statistics

has originated as a science of statehood and found applications slowly and steadily

in Agriculture, Economics, Commerce, Biology, Medicine, Industry, planning,

education and so on. As on date there is no other human walk of life, where

statistics cannot be applied.

Statistics is a discipline which is concerned with:

designing experiments and other data collection,

summarizing information to aid understanding,

drawing conclusions from data, and

estimating the present or predicting the future.

Today, statistics has become an important tool in the work of many

academic disciplines such as medicine, psychology, education, sociology,

engineering and physics, just to name a few. Statistics is also important in many

aspects of society such as business, industry and government. Because of the

increasing use of statistics in so many areas of our lives, it has become very

desirable to understand and practise statistical thinking. This is important even if

you do not use statistical methods directly.

Examples of Statistics: Unemployment rate, consumer price index, rate of

violent crimes, infant mortality rates, poverty rate of a country, batting average of a

baseball player, on base percentages of a baseball player, salary rates, standardized test results.

1.2 Meaning of Statistics

The word 'Statistics' is derived from the Latin word 'Statis' which means a "political state." Clearly, statistics is closely linked with the administrative affairs of a state such as facts and figures regarding defense force, population, housing, food, financial resources etc. What is true about a government is also true about industrial administration units, and even ones personal life.

The word statistics has several meanings. In the first place, it is a plural noun which describes a collection of numerical data such as employment statistics, accident statistics, population statistics, birth and death, income and expenditure, of exports and imports etc. It is in this sense that the word 'statistics' is used by a layman or a newspaper.

Secondly the word statistics as a singular noun, is used to describe a branch of applied mathematics, whose purpose is to provide methods of dealing with a collections of data and extracting information from them in compact form by tabulating, summarizing and analyzing the numerical data or a set of observations.

The various methods used are termed as statistical methods and the person using them is known as a statistician. A statistician is concerned with the analysis and interpretation of the data and drawing valid worthwhile conclusions from the same.

It is in the second sense that we are writing this guide on statistics.

Lastly the word statistics is used in a specialized sense. It describes various numerical items which are produced by using statistics ( in the second sense ) to statistics ( in the first sense ). Averages, standard deviation etc. are all statistics in this specialized third sense.

1.3

Origin and Growth of Statistics:

The word „ Statistics‟ and „ Statistical‟ are all derived from the Latin word Status, means a political state. The theory of statistics as a distinct branch of scientific method is of comparatively recent growth. Research particularly into the mathematical theory of statistics is rapidly proceeding and fresh discoveries are being made all over the world.

1.4 Definitions :

Statistics is defined differently by different authors over a period of time. In the olden days statistics was confined to only state affairs but in modern days it embraces almost every sphere of human activity. Therefore a number of old definitions, which was confined to narrow field of enquiry were replaced by more definitions, which are much more comprehensive and exhaustive. Secondly, statistics has been defined in two different ways Statistical data and statistical methods. The following are some of the definitions of statistics as numerical data. 1. Statistics are the classified facts representing the conditions of people in a state. In particular they are the facts, which can be stated in numbers or in tables of numbers or in any tabular or classified arrangement.

2. Statistics are measurements, enumerations or estimates of natural phenomenon usually systematically arranged, analysed and presented as to exhibit important interrelationships among them.

1.4.1 Definition by Florence Nightingale

the most important science in the whole world:

for upon it depends the practical application of every other

science and every art: the one science essential to all political

and social administration, all education, all organization based on

experience, for it only gives results of our experience.

1.4.2 Definitions by A.L. Bowley:

Statistics are numerical statement of facts in any department of enquiry placed in relation to each other. - A.L. Bowley Statistics may be called the science of counting in one of the departments due to Bowley, obviously this is an incomplete definition as it takes into account only the aspect of collection and ignores other aspects such as analysis, presentation and interpretation. Bowley gives another definition for statistics, which states „ statistics may be rightly called the scheme of averages‟ . This definition is also incomplete, as averages play an important role in understanding and comparing data and statistics provide more measures.

1.4.3 Definition by Croxton and Cowden:

Statistics may be defined as the science of collection, presentation analysis and interpretation of numerical data from the logical analysis. It is clear that the

definition of statistics by Croxton and Cowden is the most scientific and realistic one. According to this definition there are four stages:

1. Collection of Data: It is the first step and this is the foundation upon which the

entire data set. Careful planning is essential before collecting the data. There are different methods of collection of data such as census, sampling, primary, secondary, etc., and the investigator should make use of correct method.

2. Presentation of data: The mass data collected should be presented in a suitable,

concise form for further analysis. The collected data may be presented in the form

of tabular or diagrammatic or graphic form.

3. Analysis of data: The data presented should be carefully analysed for making

inference from the presented data such as measures of central tendencies, dispersion, correlation, regression etc.,

4. Interpretation of data: The final step is drawing conclusion from the data

collected. A valid conclusion must be drawn on the basis of analysis. A high degree of skill and experience is necessary for the interpretation.

1.4.4 Definition by Horace Secrist:

Statistics may be defined as the aggregate of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to a reasonable standard of accuracy, collected in a systematic manner, for a predetermined purpose and placed in relation to each other. The above definition seems to be the most comprehensive and exhaustive.

1.4.5 Definition by Professor Secrit : The word ‟statistics‟ in the first sense is defined by Professor Secrit as follows:-

"By statistics we mean aggregate of facts affected to a marked extent by

multiplicity of causes, numerically expressed, enumerated or estimated according

to reasonable standard of accuracy, collected in a systematic manner for a

predetermined purpose and placed in relation to each other."

This definition gives all the characteristics of statistics which are :

Aggregate of facts, Affected by multiplicity of causes, Numerically expressed,

Estimated according to reasonable standards of accuracy, Collected in a systematic

manner, Collected for a predetermined purpose, Placed in relation to each other.

1.4.6 Definition by Croxton and Cowden : The word 'statistics' in the second

sense is defined by Croxton and Cowden as follows:-

"The collection, presentation, analysis and interpretation of the numerical data."

This definition clearly points out four stages in a statistical investigation, namely:

1) Collection of data

2)

Presentation

of

data

3) Analysis of data 4) Interpretation of data

In addition to this, one more stage i.e. organization of data is suggested.

1.5 Characteristics of Statistics:

1.5.1 Statistics are aggregate of facts : A single fact is not called statistics. To

become statistics, there must be more than one fact. However the data may relate

to production, sales, employment, birth, death etc.

1.5.2 Statistics are numerically expressed : Only those statements which can be

expressed numerically are statistics. It does not deal with qualitative statements

like students of MBA are intelligent. On the other hand if say that sales of Escorts

Ltd. is Rs. 354 crores. These are statistical facts stated numerically.

1.5.3 Statistics are effected to a marked extent by multiplicity of causes :

Statistical data are affected to a great extent by various causes. For instance, the production of wheat depends upon the quality of seed, rainfall, quality of soil, fertilizer used, method of cultivation etc.

1.5.4 Statistics are collected in a systematic order : Statistical data are collected

in a systematic manner. Means the investigator has to chalk out a plan keeping in view the objective of data collection, determine the statistical unit, technique of data collection and so on.

1.5.5 Statistics must be collected for a predetermined purpose : The objective

of data collection must be predetermined and well established. A mere statement of purpose is insufficient.

1.5.6 Statistics should be placed in relation to each other : The Statistical data

must be comparable. It is possible only when the data are homogeneous.

1.6 Functions of Statistics:

There are many functions of statistics. Let us consider the following five important functions.

1.6.1 Condensation:

Generally speaking by the word „ to condense‟ , we mean to reduce or to lessen. Condensation is mainly applied at embracing the understanding of a huge mass of data by providing only few observations. If in a particular class in Chennai School, only marks4

1.6.2

Comparison:

Classification and tabulation are the two methods that are used to condense the data. They help us to compare data collected from different sources. Grand totals, measures of central tendency measures of dispersion, graphs and diagrams, coefficient of correlation etc provide ample scope for comparison. If we have one group of data, we can compare within itself. If the rice production (in Tonnes) in Tanjore district is known, then we can compare one region with another region within the district. Or if the rice production (in Tonnes) of two different districts within Tamilnadu is known, then also a comparative study can be made. As statistics is an aggregate of facts and figures, comparison is always possible and in fact comparison helps us to understand the data in a better way.

1.6.3 Forecasting:

By the word forecasting, we mean to predict or to estimate before hand. Given the data of the last ten years connected to rainfall of a particular district in Tamilnadu, it is possible to predict or forecast the rainfall for the near future. In business also forecasting plays a dominant role in connection with production, sales, profits etc. The analysis of time series and regression analysis plays an important role in forecasting.

1.6.4 Estimation:

One of the main objectives of statistics is drawn inference about a

population from the analysis for the sample drawn from that population. The four major branches of statistical inference are

1. Estimation theory

2. Tests of Hypothesis

3. Non Parametric tests

In estimation theory, we estimate the unknown value of the population parameter based on the sample observations. Suppose we are given a sample of heights of hundred students in a school, based upon the heights of these 100 students, it is possible to estimate the average height of all students in that school.

1.6.5 Tests of Hypothesis:

A statistical hypothesis is some statement about the probability distribution, characterising a population on the basis of the information available from the sample observations. In the formulation and testing of hypothesis, statistical methods are extremely useful. Whether crop yield has increased because of the use of new fertilizer or whether the new medicine is effective in eliminating a particular disease are some examples of statements of hypothesis and these are tested by proper statistical tools.

1.7 Scope of Statistics:

Statistics is not a mere device for collecting numerical data, but as a means of developing sound techniques for their handling, analysing and drawing valid inferences from them. Statistics is applied in every sphere of human activity social as well as physical like Biology, Commerce, Education, Planning, Business Management, Information Technology, etc. It is almost impossible to find a single department of human activity where statistics cannot be applied. We now discuss briefly the applications of statistics in other disciplines.

1.7.1 Statistics and Industry:

Statistics is widely used in many industries. In industries, control charts are widely used to maintain a certain quality level. In production engineering, to find whether the product is conforming to specifications or not, statistical tools, namely inspection plans, control charts, etc., are of extreme importance. In inspection

plans we have to resort to some kind of sampling a very important aspect of Statistics.

1.7.2 Statistics and Commerce:

Statistics are lifeblood of successful commerce. Any businessman cannot afford to either by under stocking or having overstock of his goods. In the beginning he estimates the demand for his goods and then takes steps to adjust with his output or purchases. Thus statistics is indispensable in business and commerce. As so many multinational companies have invaded into our Indian economy, the size and volume of business is increasing. On one side the stiff competition is increasing whereas on the other side the tastes are changing and new fashions are emerging. In this in an examination are given, no purpose will be served. Instead if we are given the average mark in that particular examination, definitely it serves the better purpose. Similarly the range of marks is also another measure of the data. Thus, Statistical measures help to reduce the complexity of the data and consequently to understand any huge mass of data. connection, market survey plays an important role to exhibit the present conditions and to forecast the likely changes in future.

1.7.3 Statistics and Agriculture:

Analysis of variance (ANOVA) is one of the statistical tools developed by Professor R.A. Fisher, plays a prominent role in agriculture experiments. In tests of significance based on small samples, it can be shown that statistics is adequate to test the significant difference between two sample means. In analysis of variance, we are concerned with the testing of equality of several population means.

For an example, five fertilizers are applied to five plots each of wheat and the yield of wheat on each of the plots are given. In such a situation, we are interested in finding out whether the effect of these fertilisers on the yield is significantly different or not. In other words, whether the samples are drawn from the same normal population or not. The answer to this problem is provided by the technique of ANOVA and it is used to test the homogeneity of several population means.

1.7.4 Statistics and Economics:

Statistical methods are useful in measuring numerical changes in complex groups and interpreting collective phenomenon. Nowadays the uses of statistics are abundantly made in any economic study. Both in economic theory and practice, statistical methods play an important role. Alfred Marshall said, “ Statistics are the straw only which I like every other economist have to make the bricks”. It may also be noted that statistical data and techniques of statistical tools are immensely useful in solving many economic problems such as wages, prices, production, distribution of income and wealth and so on. Statistical tools like Index numbers, time series Analysis, Estimation theory, Testing Statistical Hypothesis are extensively used in economics.

1.7.5 Statistics and Education:

Statistics is widely used in education. Research has become a common feature in all branches of activities. Statistics is necessary for the formulation of policies to start new course, consideration of facilities available for new courses etc. There are many people engaged in research work to test the past knowledge and evolve new knowledge. These are possible only through statistics.

1.7.6

Statistics and Planning:

Statistics is indispensable in planning. In the modern world, which can be termed as the “world of planning”, almost all the organisations in the government are seeking the help of planning for efficient working, for the formulation of policy decisions and execution of the same. In order to achieve the above goals, the statistical data relating to production, consumption, demand, supply, prices, investments, income expenditure etc and various advanced statistical techniques for processing, analysing and interpreting such complex data are of importance. In India statistics play an important role in planning, commissioning both at the central and state government levels.

1.7.7 Statistics and Medicine:

In Medical sciences, statistical tools are widely used. In order to test the efficiency of a new drug or medicine, t - test is used or to compare the efficiency of two drugs or two medicines, t test for the two samples is used. More and more applications of statistics are at present used in clinical investigation.

1.7.8 Statistics and Modern applications:

Recent developments in the fields of computer technology and information technology have enabled statistics to integrate their models and thus make statistics a part of decision making procedures of many organisations. There are so many software packages available for solving design of experiments, forecasting simulation problems etc. SYSTAT, a software package offers mere scientific and technical graphing options than any other desktop statistics package. SYSTAT supports all types of scientific and technical research in various diversified fields as follows

1. Archeology: Evolution of skull dimensions

2. Epidemiology: Tuberculosis

4.

Manufacturing: Quality improvement

5. Medical research: Clinical investigations.

6. Geology: Estimation of Uranium reserves from ground water.

1.8 Limitations of statistics:

Statistics with all its wide application in every sphere of human activity has

its own limitations. Some of them are given below.

1.8.1 Statistics is not suitable to the study of qualitative phenomenon: Since

statistics is basically a science and deals with a set of numerical data, it is

applicable to the study of only these subjects of enquiry, which can be expressed in

terms of quantitative measurements. As a matter of fact, qualitative phenomenon

like honesty, poverty, beauty, intelligence etc, cannot be expressed numerically

and any statistical analysis cannot be directly applied on these qualitative

phenomenons. Nevertheless, statistical techniques may be applied indirectly by

first reducing the qualitative expressions to accurate quantitative terms. For

example, the intelligence of a group of students can be studied on the basis of their

marks in a particular examination.

1.8.2 Statistics does not study individuals: Statistics does not give any specific

importance to the individual items, in fact it deals with an aggregate of objects.

Individual items, when they are taken individually do not constitute any statistical

data and do not serve any purpose for any statistical enquiry.

1.8.3 Statistical laws are not exact: It is well known that mathematical and

physical sciences are exact. But statistical laws are not exact and statistical laws

are only approximations. Statistical conclusions are not universally true. They are

true only on an average.

1.8.4

Statistics table may be misused: Statistics must be used only by experts;

otherwise, statistical methods are the most dangerous tools on the hands of the inexpert. The use of statistical tools by the inexperienced and untraced persons might lead to wrong conclusions. Statistics can be easily misused by quoting wrong figures of data. As King says aptly „ statistics are like clay of which one can make a God or Devil as one pleases‟ .

1.8.5 Statistics is only, one of the methods of studying a problem:

Statistical method do not provide complete solution of the problems because problems are to be studied taking the background of the countries culture, philosophy or religion into consideration. Thus the statistical study should be supplemented by other evidences.

1.9 Distrust Of Statistics

It is often said by people that, "statistics can prove anything." There are three types of lies - lies, demand lies and statistics - wicked in the order of their naming. A Paris banker said, "Statistics is like a miniskirt, it covers up essentials but gives you the ideas."

Thus by "distrust of statistics" we mean lack of confidence in statistical statements and methods. The following reasons account for such views about statistics. Figures are convincing and, therefore people easily believe them. They can be manipulated in such a manner as to establish foregone conclusions. The wrong representation of even correct figures can mislead a reader. For example, John earned \$ 4000 in 1990 - 1991 and Jem earned \$ 5000. Reading this one would form the opinion that Jem is decidedly a better worker than John. However if we carefully examine the statement, we might reach a different

conclusion as Jem‟s earning period is unknown to us. Thus while working with statistics one should not only avoid outright falsehoods but be alert to detect possible distortion of the truth.

1.10 Uses of Statistics :

1.10.1 To present the data in a concise and definite form : Statistics helps in

classifying and tabulating raw data for processing and further tabulation for end users.

1.10.2 To make it easy to understand complex and large data : This is done by

presenting the data in the form of tables, graphs, diagrams etc., or by condensing the data with the help of means, dispersion etc.

1.10.3 For comparison : Tables, measures of means and dispersion can help in

comparing different sets of data

1.10.4 In forming policies : It helps in forming policies like a production schedule, based on the relevant sales figures. It is used in forecasting future demands.

Complex problems can be well

understood by statistics, as the conclusions drawn by an individual are more definite and precise than mere statements on facts.

1.10.5

Enlarging

individual

experiences

:

1.10.6 In measuring the magnitude of a phenomenon: Statistics has made it

possible to count the population of a country, the industrial growth, the agricultural growth, the educational level (of course in numbers).

1.11

Types of Statistics

As mentioned earlier, for a layman or people in general, statistics means numbers - numerical facts, figures or information. The branch of statistics wherein we record and analyze observations for all the individuals of a group or population and draw inferences about the same is called "Descriptive statistics" or "Deductive statistics". On the other hand, if we choose a sample and by statistical treatment of this, draw inferences about the population, then this branch of statistics is known as Statical Inference or Inductive Statistics.

In our discussion, we are mainly concerned with two ways of representing descriptive statistics : Numerical and Pictorial.

 1 Numerical statistics are numbers. But some numbers are more meaningful such as mean, standard deviation etc. 2 When the numerical data is presented in the form of pictures (diagrams) and graphs, it is called the Pictorial statistics. This statistics makes confusing and complex data or information, easy, simple and straightforward, so that even the layman can understand it without much difficulty. 1.12 Common Mistakes Committed In Interpretation of Statistics 1 1.12.1 Bias:- Bias means prejudice or preference of the investigator, which creeps in consciously and unconsciously in proving a particular point. 2 1.12.2 Generalization:- Some times on the basis of little data available

3.

1.12.3 Wrong conclusion:- The characteristics of a group if attached to

an

conclusions.

complete

classification, the influence of various factors may not be properly understood.

that group, may lead us to draw absurd

individual

member

of

4. 1.12.4

Incomplete

classification:- If

we

fail

to

give

a

5. 1.12.5 There may be a wrong use of percentages.

6. 1.12.6 Technical mistakes may also occur.

7. 1.12.7 An inconsistency in definition can even exist.

8. 1.12.8 Wrong causal inferences may sometimes be drawn.

9. 1.12.9 There may also be a misuse of correlation.

Chapter One: End Chapter Quizzes

1. The statement, “ Statistics is both a science and an art”, was given by

a- R. A. Fisher c- L. R. Connor

b- Tippet d- A. L. Bowley

2. The word “statistics” is used as

a- Singular

c- Singular and plural both

b- Plural

d- none of the above

3. “Statistics provides tools and techniques for research workers”, was

stated by a- John I. Griffin c-A. M. Mood

b- W. I. King d- A. L. Boddington

4. Out of various definitions given by the following workers, which

definition is considered to be most exact?

a- R. A. Fisher c- M. G. Kendall

b- A. L. Bowley d- Cecil H. Meyers

5. Who stated that there are three kinds of lies: lies, dammed lies and

statistics. a- Mark Twin c- Darrell Huff

b- Disraeili d- G. W. Snedecor

6. Which of the following represents data?

a- a single value

b- only two values in a set

c- a group of values in a set d- none of the above

7. Statistics deals with

a- qualitative information c- both (a)and (b)

b- quantitative information d- none of (a) and (b)

8. Relative error is always

a- positive

c- positive and negative both

b- negative

d- zero

9. The statement, “Designing of an appropriate questionnaire itself wins

half the battle”, was given by a- A. R. Ilersic c- H. Huge

b- W. I. King d- H. Secrist

10. Who originally gave the formula for the estimation of errors of the

type

a- L. R. Connor c- A. L. Bowley

b- W. I. King d- A. L. Boddington

CHAPTER TWO: PRIMARY AND SECONDARY DATA

2.1 Primary Data

The foundation of statistical investigation lies on data so utmost care must be taken while collecting data. If the collected data are inaccurate and inadequate, the whole analysis and interpretation will also become misleading and unreliable. The method of collection of data depends upon the nature, object and scope of statistical enquiry on the one hand and the availability of time and money on the other hand. Data, or facts, may be derived from several sources. Data can be classified as primary data and secondary data. Primary data is data gathered for the first time by the researcher. So if the investigator himself prefers to collect the data for the purpose of purpose and enquiry and uses the data, it is called collection of primary data. These data are original in nature. According to Horace Secrist, “primary data are meant that data which are original, that is, those in which little or no grouping has been made, for instance being recorded or itemized as encountered. They are essentially raw material.”

2.2 Sources of Primary Data

Primary data may be collected by using the following methods, namely :

Under this method the investigator

personally contacts the informants and collect the data. This method of data collection is suitable where the field of enquiry is limited or the nature of inquiry is confidential.

2.2.1 Direct personal investigations :

2.2.2

Indirect oral investigations : This method is generally used in those

cases where informants are reluctant to give information, so information is gathered from those who possess information on the problem under

investigation. The informants are called witnesses. This method of investigation is normally used by enquiry committees and commissions.

Under this method, the

investigator appoints local agents or correspondents indifferent parts of the field

of enquiry. They send information on specific issues on regular basis to investigator. This method is generally adopted by various television news channels, newspapers and periodicals on regular basis.

2.2.4 Mailed questionnaire method : Under this method, a questionnaire is

prepared by the investigator containing questions on the problem under investigations. This questionnaires are mailed to various informants who are requested to return by mail after answering the questions. A covering letter is also enclosed requesting the informants to reply before a specific date.

Under this method,

2.2.5 Schedule to be filled in by the enumerator :

enumerators are appointed areawise. They contact the informants and and information is filled up by them in the schedules. The enumerators should be honest, painstaking and tactful as they have to deal with people of different nature.

2.2.3

Information

through

correspondence

:

2.3 Secondary Data

Secondary data is data taken by the researcher from secondary sources, internal or external. The researcher must thoroughly search secondary data sources before commissioning any efforts for collecting primary data. Once the primary data are collected and published, it becomes secondary data for other investigators.

Hence, the data obtained from published or unpublished sources are known as secondary data. There are many advantages in searching for and analyzing data before attempting the collection of primary data. In some cases, the secondary data itself may be sufficient to solve the problem. Usually the cost of gathering secondary data is much lower than the cost of organizing primary data. Moreover, secondary data has several supplementary uses. It also helps to plan the collection of primary data, in case, it becomes necessary. Blair has rightly defined, “secondary data, as those already in existence and which have been collected for some other purpose than the answering of the question at hand.” Secondary data is of two kinds, internal and external. Secondary data whether internal or external is data already collected by others, for purposes other than the solution of the problem on hand. Business firms always have as great deal of internal secondary data with them. Sales statistics constitute the most important component of secondary data in marketing and the researcher uses it extensively. All the output of the MIS of the firm generally constitutes internal secondary data. This data is readily available; the market researcher gets it without much effort, time and money.

2.4 The nature of secondary sources of information

Secondary data is data which has been collected by individuals or agencies for purposes other than those of our particular research study. For example, if a government department has conducted a survey of, say, family food expenditures, then a food manufacturer might use this data in the organisation's evaluations of the total potential market for a new product. Similarly, statistics prepared by a ministry on agricultural production will prove useful to a whole host of people and organisations, including those marketing agricultural supplies.

No marketing research study should be undertaken without a prior search of secondary sources (also termed desk research). There are several grounds for making such a bold statement. Secondary data may be available which is entirely appropriate and wholly adequate to draw conclusions and answer the question or solve the problem. Sometimes primary data collection simply is not necessary. It is far cheaper to collect secondary data than to obtain primary data. For the same level of research budget a thorough examination of secondary sources can yield a great deal more information than can be had through a primary data collection exercise. The time involved in searching secondary sources is much less than that needed to complete primary data collection. Secondary sources of information can yield more accurate data than that obtained through primary research. This is not always true but where a government or international agency has undertaken a large scale survey, or even a census, this is likely to yield far more accurate results than custom designed and executed surveys when these are based on relatively small sample sizes. It should not be forgotten that secondary data can play a substantial role in the exploratory phase of the research when the task at hand is to define the research problem and to generate hypotheses. The assembly and analysis of secondary data almost invariably improves the researcher's understanding of the marketing problem, the various lines of inquiry that could or should be followed and the alternative courses of action which might be pursued. Secondary sources help define the population. Secondary data can be extremely useful both in defining the population and in structuring the sample to be taken. For instance, government statistics on a country's agriculture will help

decide how to stratify a sample and, once sample estimates have been calculated, these can be used to project those estimates to the population.

2.5 Sources of Secondary data

Secondary sources of data may be divided into two categories: internal sources and external sources.

2.5.1 Internal sources of secondary data

Sales data : All organisations collect information in the course of their

everyday operations. Orders are received and delivered, costs are recorded, sales personnel submit visit reports, invoices are sent out, returned goods are recorded and so on. Much of this information is of potential use in marketing research but a surprising amount of it is actually used. Organisations frequently overlook this valuable resource by not beginning their search of secondary sources with an internal audit of sales invoices, orders, inquiries about products not stocked, returns from customers and sales force customer calling sheets. For example, consider how much information can be obtained from sales orders and invoices:

Sales by territory Sales by customer type Prices and discounts Average size of order by customer, customer type, geographical area Average sales by sales person and Sales by pack size and pack type, etc. This type of data is useful for identifying an organisation's most profitable product and customers. It can also serve to track trends within the enterprise's existing customer group.

Financial data: An organisation has a great deal of data within its files

on the cost of producing, storing, transporting and marketing each of its products and product lines. Such data has many uses in marketing research including

allowing measurement of the efficiency of marketing operations. It can also be used to estimate the costs attached to new products under consideration, of particular utilisation (in production, storage and transportation) at which an organisation's unit costs begin to fall.

Transport data: Companies that keep good records relating to their

transport operations are well placed to establish which are the most profitable routes, and loads, as well as the most cost effective routing patterns. Good data on transport operations enables the enterprise to perform trade-off analysis and

thereby establish whether it makes economic sense to own or hire vehicles, or the point at which a balance of the two gives the best financial outcome.

Storage data: The rate of stockturn, stockhandling costs, assessing the

efficiency of certain marketing operations and the efficiency of the marketing system as a whole. More sophisticated accounting systems assign costs to the cubic space occupied by individual products and the time period over which the product occupies the space. These systems can be further refined so that the profitability per unit, and rate of sale, are added. In this way, the direct product profitability can be calculated.

2.5.2 External sources of secondary information

The marketing researcher who seriously seeks after useful secondary data is more often surprised by its abundance than by its scarcity. Too often, the researcher has secretly (sometimes subconsciously) concluded from the outset that his/her topic of study is so unique or specialised that a research of secondary

sources is futile. Consequently, only a specified search is made with no real expectation of sources. Cursory researches become a self-fulfilling prophecy. Dillon et. al 3 give the following advice:

"You should never begin a half-hearted search with the assumption that

what is being sought is so unique that no one else has ever bothered to collect it

and publish it. On the contrary, assume there are scrolling secondary data that

should help provide definition and scope for the primary research effort."

The same authors support their advice by citing the large numbers of organisations that provide marketing information including national and local government agencies, quasi-government agencies, trade associations, universities, research institutes, financial institutions, specialist suppliers of secondary marketing data and professional marketing research enterprises. Dillon et al further advise that searches of printed sources of secondary data begin with referral texts such as directories, indexes, handbooks and guides. These sorts of publications rarely provide the data in which the researcher is interested but serve in helping him/her locate potentially useful data sources. The main sources of external secondary sources are :

 (1) Government (federal, state and local) (2) Trade associations (3) Commercial services (4) National and international institutions.
 Governme nt statistics These may include all or some of the following: Population censuses Social surveys, family expenditure surveys Import/export statistics
 Production Agricultural statistics. statistics Trade Trade associations differ widely in the extent of their data collection and information dissemination activities. However, it is worth checking with them to determine what they do publish. At the very least one would normally expect that they would produce a trade directory and, perhaps, a yearbook. associations Commerci al services Published market research reports and other publications are available from a wide range of organisations which charge for their information. Typically, marketing people are interested in media statistics and consumer information which has been obtained from large scale consumer or farmer panels. The commercial organisation funds the collection of the data, which is wide ranging in its content, and hopes to make its money from selling this data to interested parties. National and international institutions Bank economic reviews, university research reports, journals and articles are all useful sources to contact. International agencies such as World Bank, IMF, IFAD, UNDP, ITC, FAO and ILO produce a plethora of secondary data which can prove extremely useful to the marketing researcher.

2.5.3 Examples of Sources of External Secondary Data

Following are some of the examples of sources of external secondary data :

The Internet is a great source of external secondary data. Many

published, statistics and figures are available on the internet either free or for a fee.

The yellow pages of telephone directories/stand alone yellow pages

have become an established source of elementary business information. Tata Press,

which first launched a stand alone yellow pages directory for Mumbai City, and „GETIT‟ yellow pages have been leading in this field. Today, yellow pages publications are available for all cities and major town a in the country. New Horizons, a joint venture between the Living Media group of publications and Singapore Telecom has been publishing stand alone directories for specific businesses. Business India data base of the Business India publications had been publishing the Delhi Pages directory.

The Thomas Register is the world‟s most powerful industrial buying

guide. It ensures a fast, frictionless flow of information between buyers and sellers of industrial goods and services. This purchasing tool is now available in India.

The Thomas Register of Indian manufacturers or TRIM – is India‟s first dedicated

manufacture-to-manufacture register. It features 120,000 listing of 40,000 industrial manufacturers and industrial service categories. It is available in print, CD forms and on the internet.

The source Directory brought out by Mumbai based Source Publishers

is another example. It covers contact information on advertising agencies and related services and products, music companies, market research agencies, marketing and sales promotion consultants, publication, radio stations and cable

and satellite station telemarketing services, among others. It currently has editions for Metro cites.

The Industrial Product Finder (IPF): IPF details the many application

of the new products and tells what is available and from whom. Most

manufacturers of industrial products ensure that a description of their product is published in IPF before they hit the market.

Phone data service: Agencies providing phone data services have also

come up in major cities in recent times Melior Communication for example, offers a tele-data service. Basic data on a number of subjects/products can be had through call to the agency. The service is termed Tell me Business through phone service. Its main aim, like that of yellow pages, is to bring buyers and sellers of products together. It also provides some elementary databank support to researchers.

2.6 The problems of secondary sources

Whilst the benefits of secondary sources are considerable, their shortcomings have to be acknowledged. There is a need to evaluate the quality of both the source of the data and the data itself. The main problems may be categorized as follows:

ons

Definiti

The researcher has to be careful, when making use of secondary data, of the definitions used by those responsible for its preparation. Suppose, for example, researchers are interested in rural communities and their average family size. If published statistics are consulted then a check must be done on how terms such as “family size” have been defined. They may refer only to the nucleus family or include the extended family. Even apparently simple terms such as „farm size‟ need

careful handling. Such figures may refer to any one of the following: the land an individual owns, the land an individual owns plus any additional land he/she rents, the land an individual owns minus any land he/she rents out, all of his land or only that part of it which he actually cultivates. It should be noted that definitions may change over time and where this is not rganizati erroneous conclusions may be drawn. Geographical areas may have their boundaries redefined, units of measurement and grades may change and imported goods can be reclassified from time to time for purposes of levying customs and excise duties.

Measur

ement error

When a researcher conducts fieldwork she/he is possibly able to estimate inaccuracies in measurement through the standard deviation and standard error, but these are sometimes not published in secondary sources. The only solution is to try to speak to the individuals involved in the collection of the data to obtain some guidance on the level of accuracy of the data. The problem is sometimes not so much „error‟ but differences in levels of accuracy required by decision makers. When the research has to do with large investments in, say, food manufacturing, management will want to set very tight margins of error in making market demand estimates. In other cases, having a high level of accuracy is not so critical. For instance, if a food manufacturer is merely assessing the prospects for one more flavour for a snack food already produced by the company then there is no

 need for highly accurate estimates in order to make the investment decision. Source Researchers have to be aware of vested interests when they consult secondary sources. Those responsible for their compilation may have reasons for wishing to present a more optimistic or pessimistic set of results for their rganization. It is not unknown, for example, for officials responsible for estimating food shortages to exaggerate figures before sending aid requests to potential donors. Similarly, and with equal frequency, commercial rganizations have been known to inflate estimates of their market shares. bias Reliabil The reliability of published statistics may vary over time. It is not uncommon, for example, for the systems of collecting data to have changed over time but without any indication of this to the reader of published statistics. Geographical or administrative boundaries may be changed by government, or the basis for stratifying a sample may have altered. Other aspects of research methodology that affect the reliability of secondary data is the sample size, response rate, questionnaire design and modes of analysis. ity Time Most censuses take place at 10 year intervals, so data from this and other published sources may be out-of-date at the time the researcher wants to make use of the statistics. The time period during which secondary data was first compiled may have a substantial effect upon the nature of the scale

data. For instance, the significant increase in the price obtained for Ugandan coffee in the mid-90‟s could be interpreted as evidence of the effectiveness of the rehabilitation programme that set out to restore coffee estates which had fallen into a state of disrepair. However, more knowledgeable coffee market experts would interpret the rise in Ugandan coffee prices in the context of large scale destruction of the Brazilian coffee crop, due to heavy frosts, in 1994, Brazil being the largest coffee producer in the world.

Whenever possible, marketing researchers ought to use multiple sources of secondary data. In this way, these different sources can be cross-checked as confirmation of one another. Where differences occur an explanation for these must be found or the data should be set aside.

2.7 Difference between Primary & Secondary Data

The difference between primary data and secondary data can be studied in following points, which are :

Primary research entails the use of immediate data in determining the survival of the market. The popular ways to collect primary data consist of surveys, interviews and focus groups, which shows that direct relationship between potential customers and the companies. Whereas secondary research is a means to reprocess and reuse collected information as an indication for betterments of the service or product. Both primary and secondary data are useful for businesses but both may differ from each other in various aspects.

In secondary data, information relates to a past period. Hence, it lacks

aptness and therefore, it has unsatisfactory value. Primary data is more accommodating as it shows latest information.

Secondary data is obtained from some other organization than the one

instantaneously interested with current research project. Secondary data was collected and analyzed by the organization to convene the requirements of various research objectives. Primary data is accumulated by the researcher particularly to

meet up the research objective of the subsisting project.

Secondary data though old may be the only possible source of the desired

data on the subjects, which cannot have primary data at all. For example, survey reports or secret records already collected by a business group can offer

information that cannot be obtained from original sources.

Firm in which secondary data are accumulated and delivered may not

accommodate the exact needs and particular requirements of the current research

study. Many a time, alteration or modifications to the exact needs of the investigator may not be sufficient. To that amount usefulness of secondary data will be lost. Primary data is completely tailor-made and there is no problem of adjustments.

Secondary data is available effortlessly, rapidly and inexpensively.

Primary data takes a lot of time and the unit cost of such data is relatively high.

Chapter Two: End Chapter Quizzes

1. Statistical results are,

a- cent per correct c- always incorrect

b- not absolutely correct d- misleading

2. Data taken for the publication, „Agricultural Situation in India‟ will be

considered as a primary data

b- secondary data

c- primary and secondary data

d- neither primary nor secondary

3.

Mailed

respondents

quesetionnaire

 a- live in cities b- have high income c- are educated d- are known

methods

of

enquiry

can

be

4.Statistical data are collected for, a- collecting data without any purpose b- a given purpose c- any purpose d- none of the above

if

5. Method of complete enumeration is applicable for

 a- Knowing the production b- Knowing the population
 c- Knowing the quantum of export and im port d- All the above 6. A statistical population may consist of a- an infinite number of items b- an finite numberof items c- either of (a) and (b) d- none of (a) and (b) 7. Which of the following example does not constitute an infinite

population?

 a- Population consisting of odd numbers b- Population of weights of newly born babies c- Population of heights of 15-years -old children d- Population of head and tails in tossing a coin successively

8. Which of the following can be classified as hypothetical population?

 a- All labourers of a factory b- Female population of a factory c- Population of real numbers between 0 and 100 d- students of the world

9. A study based on complete enumeration is known as

 a- sample survey b- pilot survey c- census survey d- none of the above

10.Statistical results are

 a- absolutely correct b- not true c- true on average

d- universally true

CHAPTER THREE : MEASURES OF DISPERSION

Meaning There may be variations in the items of different distributions from average

despite the fact that they have value of mean. Hence, the measure of central

3.1

tendency alone are incapable of taking complete decisions about the decisions. It

has to be supplemented by some other measures.

3.2 Definitions :

“Dispersion is the measure of the variation of the items.”

---- A.L. Bowley

“Dispersion is the measure of the extent to which the individual items vary.”

---- L.R. Connor

The arithmetic mean of the deviations of the values of the individual items

from the measure of a particular central tendency used. Thus the ‟dispersion‟ is

also known as the "average of the second degree." Prof. Griffin and Dr. Bowley

said the same about the dispersion.

3.3

Types of Dispersion :

Dispersion can be divided into following types :

3.3.1 Absolute Dispersion : It is measured in the same statistical unit in

which the original data exist, e.g., kg, rupee, years etc.

3.3.2 Relative Dispersion : Absolute dispersion fails to measure the

comparison between two series specially when the statistical unit is not the same.

Hence, absolute dispersion has to be converted into relative measure of dispersion.

Relative dispersion is measured in ratio form. It is also called coefficient of

dispersion.

The measures of central tendencies (i.e. means) indicate the general magnitude of the data and locate only the center of a distribution of measures. They do not establish the degree of variability or the spread out or scatter of the individual items and their deviation from (or the difference with) the means. i) According to Nciswanger, "Two distributions of statistical data may be symmetrical and have common means, medians and modes and identical frequencies in the modal class. Yet with these points in common they may differ widely in the scatter or in their values about the measures of central tendencies." ii) Simpson and Kafka said, "An average alone does not tell the full story. It is hardly fully representative of a mass, unless we know the manner in which the individual item. Scatter around it. A further description of a series is necessary, if we are to gauge how representative the average is." From this discussion we now focus our attention on the scatter or variability which is known as dispersion. Let us take the following three sets.

 Students Group X Gro Group Z up Y 1 50 45 30 2 50 50 45 3 50 55 75 mean = 50 50 50

Thus, the three groups have same mean i.e. 50. In fact the median of group X and Y are also equal. Now if one would say that the students from the three groups are of equal capabilities, it is totally a wrong conclusion then. Close examination reveals that in group X students have equal marks as the mean, students from group Y are very close to the mean but in the third group Z, the

marks are widely scattered. It is thus clear that the measures of the central tendency

is alone not sufficient to describe the data.

3.4 Features of an ideal measure of dispersion An ideal measure of dispersion must possess the following features :

Simple to understand

Easy to compute

Well defined measure

Based on all the items of data

Capable of algebraic treatment

Should not be affected by the extreme items.

3.5 Methods of measuring Dispersion

Dispersion can be calculated by using any of the following method :

3.5.1 Range

3.5.2 Quartile Deviation

3.5.3 Mean Deviation

3.5.4 Standard Deviation

3.5.5 Co-efficient of Variation

3.5.1 Range

In any statistical series, the difference between the largest and the smallest

values is called as the range.

Thus Range (R) = L - S

Coefficient of Range : The relative measure of the range. It is used in the

comparative study of the dispersion co-efficient of Range = Example ( Individual series ) Find the range and the co-efficient of the range of the following items :

110, 117, 129, 197, 190, 100, 100, 178, 255, 790. Solution: R = L - S = 790 - 100 = 690 Solution: R = L - S = 100 - 10 = 90

Co-efficient of range = Example ( Discrete Series ) Find the range and the co-efficient of the range of the following items :

 x 8 10 12 13 14 17 f 3 8 12 10 6 4

Solution

 X f 8 3 10 8 12 12 13 10 14 6 17 4

Range = L-S = 17- 8 = 9 Coefficient of Range = = (17-8) / (17+8)

L-S/ L+S

=

9/25

= 0.36

Continuous Series Example (Continuous Series) Find the range and the co-efficient of the range of the following items :

X(m

0-10

10-20

20-30

30-40

40-50

arks)

F(St

udents)

5

8

12

6

4

Solution

 X(Marks) F(Students) 0-10 5 10-20 8 20-30 12 30-40 6 40-50 4 Range = L-S

= 50-0

50

Coefficient of Range = (L-S) / (L+S)

Relative Range = (50-0) / (50+0)

= 50/50

=1

3.5.2 Quartile Deviations

If we concentrate on two extreme values ( as in the case of range ), we don‟t get any idea about the scatter of the data within the range ( i.e. the two extreme values ). If we discard these two values the limited range thus available might be more informative. For this reason the concept of interquartile range is developed. It is the range which includes middle 50% of the distribution. Here 1/4 ( one quarter of the lower end and 1/4 ( one quarter ) of the upper end of the observations are excluded.

Now the lower quartile ( Q 1 ) is the 25th percentile and the upper quartile ( Q 3 ) is the 75th percentile. It is interesting to note that the 50th percentile is the middle quartile ( Q 2 ) which is in fact what you have studied under the title ‟ Median ". Thus symbolically If we divide ( Q 3 - Q 1 ) by 2 we get what is known as Semi-Iinter quartile

range.

Q.D. = (Q3-Q1)/2, where Q1 = First Quartile and Q3 = Third quartile Relative or Coefficient of Q.D. :

To find the coefficient of Q. D., we divide the semi interquartile range by the sum of semi interquartiles. Symbolically :

Coefficient of Q.D. = (Q3 Q1) / (Q3 + Q1) Example ( Individual Series ) Find the quartile deviation and its co-efficient from the following items :

X(marks)

5

8

10

12

15

9

11

12

15

20

Solution

 S. No. X(Marks) Revised X (In ascending order) 1 5 5 2 8 8 3 10 9 4 12 10 5 15 11 6 9 12 7 11 12 8 12 15 9 15 15 10 20 20

Q1 = ( N+1)/4 th item Where N = No. of items in the data Q1 = (10+1)/4

= 11/4

= 2.75 th item

and 2.75 th item = 2 nd item + ( 3 rd 2 nd item) 75/100

= 8 + (9-8) ¾

= 8 + 0.75

= 8.75

Q3 = 3 (N+1)/4 th item

= 3 ( 10+1)/4

= 33/4

= 8.25 th item

and 8.25 th item 8 th = (9 th 8 th item) 25/100

 = 15+(15-15)/4 = 15+ 0 = 15

Q.D. = (Q3 Q1) /2

= (15- 8.75)/ 2

= 3.125

and coefficient of Q.D. = (Q3 Q1) / (Q3+Q1)

= (15 8.75) / (15+8.75)

= 6.25/ 23.75

= 0.26

Example (Discrete Series) Find the range and the co-efficient of the range of the following data :

Solution

 Central size of items(x) Frequency(f) c.f. 2 2 2 3 3 5 4 5 10 5 6 16 6 8 24 7 12 36 8 16 52 9 7 59 10 5 64 11 4 68

N = 68

Q1 =

= (68+1)/ 4 th item

= (69)/4

= 17.25 th item

( N+1) /4 th item

17.25 th item lies in c.f. 24 and against value of X = 6

Q1 = 6

Q3 = 3(N+1)/4 th item

= 3(68+1)/4 th item

= (3*69)/4

= 51.75 th item

51.75 th item lies in c.f. 52 and against it value of X = 8 Q3 = 8 Q.D. = (Q3-Q1)/2

= (8-6)/2

= 1

Coefficient of Q.D. = (Q3-Q1)/(Q3+Q1)

= (8-6)/(8+6)

= 2 / 14

= 0.143

3.5.3 Mean Deviation

Average deviations ( mean deviation ) is the average amount of variations (scatter) of the items in a distribution from either the mean or the median or the mode, ignoring the signs of these deviations by Clark and Senkade. Individual Series Steps : (1) Find the mean or median or mode of the given series.

(2) Using and one of three, find the deviations ( differences ) of the items of

the series from them. i.e. xi - x, xi - Me and xi - Mo.

Me = Median and Mo = Mode.

(3) Find the absolute values of these deviations i.e. ignore there positive (+)

and negative (-) signs. i.e. | xi - x | , | xi - Me | and xi - Mo |.

(4) Find the sum of these absolute deviations.

i.e.

| xi - x | + ,

| xi - Me | , and

| xi - Mo | .

(5) Find the mean deviation using the following formula.

Note that :

(i) generally M. D. obtained from the median is the best for the practical

purpose.

(ii) co-efficient of M. D. =

Merits and Demerits of Mean Deviations Merits

1. It is a better technique of dispersion in relation to range and quartile

deviation.

2. This method is based on all the items of the data.

3. The mean deviation is less affected by the extreme items in relation to

standard deviations.

Demerits

1. This method lacks algebraic treatment as signs are ignored while taking deviation from an average. 2. Mean deviation can not be considered as a scientific methods as it ignores signs.

Example Calculate Mean deviation and its co-efficient for the following salaries:

\$ 1030, \$ 500, \$ 680, \$ 1100, \$ 1080, \$ 1740. \$ 1050, \$ 1000, \$ 2000, \$ 2250, \$ 3500 and \$ 1030.

Calculations :

i) Median (Me) = Size of = Size of 11th item. Therefore, Median ( Me) = 8

ii) M. D. =

Example ( Continuous series ) Calculate the mean deviation and the coefficient of mean deviation from the following data using the mean. Difference in ages between boys and girls of a class.

 Diff. in years No.of students 0 - 5 449 5 – 10 705 10 – 15 507 15 – 20 281 20 – 25 109
25
– 30
52
30
– 35
16
35
– 40
4

Calculation:

1) X

2) M. D.

efficient of M. D.3) Co-

3.5.4 Standard Deviation (S. D.)

It is the square root of the arithmetic mean of the square deviations of various values from their arithmetic mean. it is denoted by s.d.

Thus, s.d. (

x ) =

=

where n =

fi

Merits : (1) It is rigidly defined and based on all observations.

(2) It is amenable to further algebraic treatment.

(3) It is not affected by sampling fluctuations.

(4) It is less erratic.

Demerits : (1) It is difficult to understand and calculate.

(2) It gives greater weight to extreme values.

Note that variance V(x) =

and s. d. (

x ) =

Then V ( x ) =

and

3.5.5 Co-efficient Of Variation ( C. V. ) To compare the variations (dispersion) of two different series, relative

measures of standard deviation must be calculated. This is known as co-efficient of

variation or the co-efficient of s. d. Its formula is C. V. =

Thus it is defined as the ratio s. d. to its mean.

Remark: It is given as a percentage and is used to compare the consistency or variability of two more series. The higher the C. V. , the higher the variability and lower the C. V., the higher is the consistency of the data. Example Calculate the standard deviation and its co-efficient from the following data.

 A 10 B 12 C 16 D 8 E 25 F 30 G 14 H 11 I 13 J 11 Solution No. x i (x i - x) ( x i - x ) 2 A 10 -5 25 B 12 -3 9 C 16 +1 1 D 8 -7 49 E 25 +10 100
 F 30 +15 225 G 14 -1 1 H 11 -5 16 I 13 -2 4 J 11 -4 16 n= 10 xi = 150 |xi -x |2 = 446

Calculations :

i)

ii)

iii)

Example Calculate s.d. of the marks of 100 students.

 Marks No. of Mid- f i x i f i x i 2 students values (f i ) (x i ) 0-2 10 1 10 10 2-4 20 3 60 180 4-6 35 5 175 875 6-8 30 7 210 1470 8-10 5 9 45 405

n = 100

f i x i =

500

f i x i 2 =

2940

Solution

1)

2)

Chapter Three:- End Chapter Quizzes

 1. Which of the following is not a measure of dispersion? a- mean deviation b- quartile deviation c- standard deviation d- average deviation from mean 2. Which of the following is a unit less measure of dispersion?

a-

b-

c-

d-

standard deviation mean deviation coefficient of variation range

3. Which one of the given measures of dispersion is considered best? a-standard deviation b- range c- variance d- coefficient of variation

4. For comparison of two different series, the best measure of dispersion is

 e- range f- mean deviation g- standard deviation h- none of the above 5. Out of all measures of dispersion, the easiest one to calculate is

a- standard deviation

b- range c- variance d- quartile deviation

6. Mean deviation is minimum when deviations are taken from

a. mean

b. median

c. mode

d. zero

7. Sum of squares of the deviations is when deviations are taken from

a. mean

c. mode

d. zero

8. Which measure of dispersion is least affected by extreme values ?

a. range

b. mean deviation

c. standard deviation

d. quartile deviation

9. The average of the sum of squares of the deviations about mean

is called

a. variance

b. absolute deviation

c. standard deviation

d. mean deviation

10. Quartile deviation is equal to

a. interquartile range

b. double interquartile range

c.

half of the interquartile range

d. none of the above

CHAPTER FOUR:-MEASURES OF SKEWNESS

4.1 Skewness

11.The voluminous raw data cannot be easily understood, Hence, we calculate the measures of central tendencies and obtain a representative figure. From the measures of variability, we can know that whether most of the items of the data are close to our away from these central tendencies. But these statical means and measures of variation are not enough to draw sufficient inferences about the data. Another aspect of the data is to know its symmetry. in the chapter "Graphic display" we have seen that a frequency may be symmetrical about mode or may not be. This symmetry is well studied by the knowledge of the "skewness." Still one more aspect of the curve that we need to know is its flatness or otherwise its top. This is understood by what is known as " Kurtosis." 4.2 Definitions : Different authorities have defined skewness in different manners. Some of the definitions are as under :

According to Croxton and Cowden, “When a series is not symmetrical, it is said to be asymmetrical or skewed.” It may happen that two distributions have the same mean and standard deviations. For example, see the following diagram.

Although the two distributions have the same means and standard deviations they are not identical. Where do they differ ? They differ in symmetry. The left-hand side distribution is symmetrical one where as the distribution on the right-hand is asymmetrical or skewed. For a symmetrical distribution, the values, of equal distances on either side of the mode, have equal frequencies. Thus, the mode, median and mean - all coincide. Its curve rises slowly, reaches a maximum ( peak ) and falls equally slowly (Fig. 1). But for a skewed distribution, the mean, mode and median do not coincide. Skewness is positive or negative as per the positions of the mean and median on the right or the left of the mode. A positively skewed distribution ( Fig.2 ) curve rises rapidly, reaches the maximum and falls slowly. In other words, the tail as well as median on the right- hand side. A negatively skewed distribution curve (Fig.3) rises slowly reaches its maximum and falls rapidly. In other words, the tail as well as the median are on the left-hand side.

 Size Frequency Size Frequency Size Frequency 1 12 1 4 1 3 2 13 2 6 2 7 3 14 3 12 3 8 4 15 4 10 4 10 5 14 5 8 5 12
 6 13 6 7 6 6 7 12 7 3 7 4

4.3 Difference between Skewness and Dispersion Dispersion refers to spreadness or variations of items in a series while

skewness refers to the direction of variation in a series. Thus, we measure the lack

of symmetry in the distribution. Skewness may be both positive as well as negative

depending upon the fact whether the value of mode is on the right or on the left

side of the distribution.

4.4 Tests of Skewness

1. The values of mean, median and mode do not coincide. The more the

difference between them, the more is the skewness.

2.

3 The sum of positive deviations from the median is not equal to the sum of

Quartiles are not equidistant from the median. i.e. ( Q 3 -Me ) ( Me - Q 1 ).

the negative deviations.

4. Frequencies are not equally distributed at points of equal deviation from

the mode.

5. When the data is plotted on a graph they do not give the normal bell-

shaped form.

4.5 Methods of measurement of Skewness

1. First measure of skewness

Measure of skewness

It is given by Karl Pearson

Co-efficient of skewness

Skp = Mean - Mode

i.e. Skp =

- Mo

J =

Pearson has suggested the use of this formula if it is not possible to determine the mode (Mo) of any distribution, ( Mean - Mode ) = 3 ( mean - median )

Skp = 3 (

- Mo ) Thus J =

Note : i) Although the co-efficient of skewness is always within 1, but Karl Pearson‟s co-efficient lies within ± 3. ii) If J = 0, then there is no skewness

iii) If J is positive, the skewness is also positive.

iv) If J is negative, the skewness is also negative.

Unless and until no indication is given, you must use only Karl Pearson‟s

formula.

Example Find Karl Pearson‟s coefficient of skewness from the following

data:

 Marks above No.of students 0 150 10 140 20 100 30 80 40 80
 50 70 60 30 70 14 80 0

Note: You will always find the different values of J when calculated by Karl

Pearson‟s and Bowley‟s formula. But the value of J by Bowley‟s formula always

lies with

1.

Example The following table gives the frequency distribution of 291

workers of a factory according to their average monthly income in 1945- 55.

 Income group No.of workers (\$) Below 50 1 50-70 16 70-90 39 90-110 58 110-130 60 130-150 46
 150-170 22 170-190 15 190-210 15 210-230 9 230 & above 10 Solution Income f c.f. group Below 50 1 1 50 – 70 16 17 70 – 90 39 56 90 - 110 58 114 110 - 130 60 174 130 - 150 46 220 150 - 170 22 242 170 - 190 15 257 190 - 210 15 252 210 - 230 9 281 230 & 10 291 above n = f = 291

Calculations :

1) Median = Size of

item

= Size of

= Size of 146th item which lies in (100-130) class interval.

item

Me =

=

=

=

Chapter Four: End Chapter Quizzes

1. For a positive skewed distribution, which of the following inequally is

 a- median > mode b- mode > mean c- mean > median d- mean > mode

2. For a negatively skewed distribution, the correct inequality is

 a- mode < median b- mean < median c- mean < mode d- none of the above

3. In case of a positive skewed distribution, the relation between mean, mead,

median, and mode that hold is

 a- median >mean >mode b- mean > median > mode c- mean = median = mode d- none of the above

4. For a positive skewed frequency curve, the inequality that holds is

 a- Q1 +Q3 >2Q2 b- Q1 + Q2 > 2Q3 c- Q1 + Q3 > Q2 d- Q3 – Q1 > Q2

5. If a moderately skewed distribution has mean 30 and mode 36, the median

of the distribution is

a-

10

 b- 35 c- 20 d- zero

6. First and third quartile of a frequency distribution are 30 and 75. Also its coefficient of skewness is 0.6. The median of the frequency distribution is a- 40 b- 39

c- 41 d- 38

7. For negatively skewed distribution, the correct relation between mean, median and mode is

 a- mean = median = mode b- median < mean < mode c- mean < median < mode d- mode < mean < median

8. In the case of positive skewed distribution, the extreme values lies in the

 a- left tail b- right tail c- middle d- any where

9. The extreme values in a negatively skewed distribution lie in the

 a- middle b- right tail c- left tail d- whole curve

10. Which of the following statements is true for a measures of deviation is

a- mean deviation does not follow algebraic rule

 b- range is a crudest measure c- coefficient of variation is a relative measure d- all the above statements

CHAPTER FIVE: CORRELATION

5.1 Introduction So far we have considered only univariate distributions. By the

averages, dispersion and skewness of distribution, we get a complete idea

about the structure of the distribution. Many a time, we come across problems

which involve two or more variables. If we carefully study the figures of rain

fall and production of paddy, figures of accidents and motor cars in a city, of

demand and supply of a commodity, of sales and profit, we may find that

there is some relationship between the two variables. On the other hand, if we

compare the figures of rainfall in America and the production of cars in

Japan, we may find that there is no relationship between the two variables. If

there is any relation between two variables i.e. when one variable changes the

other also changes in the same or in the opposite direction, we say that the two

variables are correlated.

W. J. King : If it is proved that in a large number of instances two

variables, tend always to fluctuate in the same or in the opposite direction

then it is established that a relationship exists between the variables. This is

called a "Correlation."

The correlation is one of the most common and most useful statistics. A

correlation is a single number that describes the degree of relationship

between two variables. Let's work through an example to show you how this

statistic is computed.

Correlation is a statistical technique that can show whether and how

strongly pairs of variables are related. For example, height and weight are

related; taller people tend to be heavier than shorter people. The relationship

isn't perfect. People of the same height vary in weight, and you can easily

think of two people you know where the shorter one is heavier than the taller

one. Nonetheless, the average weight of people 5'5'' is less than the average

weight of people 5'6'', and their average weight is less than that of people 5'7'',

etc. Correlation can tell you just how much of the variation in peoples'

weights is related to their heights.

Although this correlation is fairly obvious your data may contain

unsuspected correlations. You may also suspect there are correlations, but

don't know which are the strongest. An intelligent correlation analysis can

It means the study of existence, magnitude and direction of the relation

between two or more variables. in technology and in statistics. Correlation is

very important. The famous astronomist Bravais, Prof. Sir Fanci‟s Galton,

Karl Pearson (who used this concept in Biology and in Genetics). Prof.

Neiswanger and so many others have contributed to this great subject.

5.2 Definitions :

“An analysis of the covariation of two or more variables is usually called

correlation.”

A. M. Tuttle

“Correlation analysis attempts to determine the degree of relationship

between variables.”

Ya Lun Chou

“The effect of correlation is to reduce the range of uncertainty of one‟s

prediction. ”

Tippett

5.3 Coefficient of Correlation

The main result of a correlation is called the correlation coefficient (or

"r"). It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the two variables are related.

If r is close to 0, it means there is no relationship between the variables.

If r is positive, it means that as one variable gets larger the other gets larger.

If r is negative it means that as one gets larger, the other gets smaller (often called an "inverse" correlation). While correlation coefficients are normally reported as r = (a value between -1 and +1), squaring them makes then easier to understand. The

square of the coefficient (or r square) is equal to the percent of the variation in one variable that is related to the variation in the other. After squaring r, ignore the decimal point. An r of .5 means 25% of the variation is related (.5 squared =.25). An r value of .7 means 49% of the variance is related (.7 squared = .49). A correlation report can also show a second result of each test - statistical significance. In this case, the significance level will tell you how likely it is that the correlations reported may be due to chance in the form of random sampling error. If you are working with small sample sizes, choose a report format that includes the significance level. This format also reports the sample size.

A key thing to remember when working with correlations is never to

assume a correlation means that a change in one variable causes a change in another. Sales of personal computers and athletic shoes have both risen strongly in the last several years and there is a high correlation between them, but you cannot assume that buying computers causes people to buy athletic shoes (or vice versa).

The second caveat is that the Pearson correlation technique works best with linear relationships: as one variable gets larger, the other gets larger (or smaller) in direct proportion. It does not work well with curvilinear relationships (in which the relationship does not follow a straight line). An example of a curvilinear relationship is age and health care. They are related, but the relationship doesn't follow a straight line. Young children and older people both tend to use much more health care than teenagers or young adults. Multiple regression (also included in the Statistics Module) can be used to examine curvilinear relationships, but it is beyond the scope of this article. Correlation Example

Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are effects your self esteem (incidentally, I don't think we have to worry about the direction of causality here -- it's not likely that self esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the data for the 20 cases (don't take this too seriously -- I made this data up to illustrate what a correlation is):

Correlation Example

Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are effects your self esteem (incidentally, I don't think we

have to worry about the direction of causality here -- it's not likely that self esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the data for the 20 cases (don't take this too seriously -- I made this data up to illustrate what a correlation is):

 Person Height Self Esteem 1 68 4.1 2 71 4.6 3 62 3.8 4 75 4.4 5 58 3.2 6 60 3.1 7 67 3.8 8 68 4.1 9 71 4.3 10 69 3.7 11 68 3.5 12 67 3.2 13 63 3.7 14 62 3.3 15 60 3.4 16 63 4.0 17 65 4.1 18 67 3.8 19 63 3.4 20 61 3.6

Now, let's take a quick look at the histogram for each variable:

And, here are the descriptive statistics:
Variable
Mean
StDev
Variance
Sum
Minimum Maximum Range
Height
65.4
4.40574
19.4105
1308
58
75
17
Self
3.755
0.426090
0.181553
75.1
3.1
4.6
1.5
Esteem
Finally, we'll look at the simple bivariate (i.e., two-variable) plot:

You should immediately see in the bivariate plot that the relationship between the variables is a positive one (if you can't see that, review the section on types of relationships) because if you were to fit a single straight line through the dots it would have a positive slope or move up from left to right. Since the correlation is nothing more than a quantitative estimate of the relationship, we would expect a positive correlation.

What does a "positive relationship" mean in this context? It means that,

in general, higher scores on one variable tend to be paired with higher scores

on the other and that lower scores on one variable tend to be paired with

lower scores on the other. You should confirm visually that this is generally

true in the plot above.

5.4 Types of Correlation

5.4.1 Positive and negative correlation

5.4.2 Linear and non-linear correlation

A) If two variables change in the same direction (i.e. if one increases the

other also increases, or if one decreases, the other also decreases), then this is called a positive correlation. For example : Advertising and sales.

B) If two variables change in the opposite direction ( i.e. if one increases,

the other decreases and vice versa), then the correlation is called a negative

correlation. For example : T.V. registrations and cinema attendance.

1. The nature of the graph gives us the idea of the linear type

of correlation between two variables. If the graph is in a straight line, the correlation is called a "linear correlation" and if the graph is not in a straight line, the correlation is non-linear or curvi-linear. For example, if variable x changes by a constant quantity, say 20 then y also changes by a constant quantity, say 4. The ratio between the two always remains the same (1/5 in this case). In case of a curvi-linear correlation this ratio does not remain constant.

5.5 Degrees of Correlation

Through the coefficient of correlation, we can measure the degree or extent of the correlation between two variables. On the basis of the coefficient of correlation we can also determine whether the correlation is positive or negative and also its degree or extent. 5.5.1 Perfect correlation: If two variables changes in the same direction and in the same proportion, the correlation between the two is perfect positive. According to Karl Pearson the coefficient of correlation in this case is +1. On the other hand if the variables change in the opposite direction and in the same proportion, the correlation is perfect negative. its coefficient of correlation is -1. In practice we rarely come across these types of correlations.

5.5.2

Absence of correlation: If two series of two variables exhibit no

relations between them or change in variable does not lead to a change in the other variable, then we can firmly say that there is no correlation or absurd

correlation between the two variables. In such a case the coefficient of correlation is 0.

5.5.3 Limited degrees of correlation: If two variables are not perfectly

correlated or is there a perfect absence of correlation, then we term the correlation as Limited correlation. It may be positive, negative or zero but lies with the limits 1. High degree, moderate degree or low degree are the three categories of this kind of correlation. The following table reveals the effect ( or degree ) of coefficient or correlation.

 Degrees Positive Negative Absence of correlation Zero 0 Perfect correlation + 1 -1 High degree + 0.75 to + - 0.75 to –1 1 Moderate degree + 0.25 to + - 0.25 to - 0.75 0.75 Low degree 0 to 0.25 0 to - 0.25

5.6 Techniques in Determining Correlation

There are several different correlation techniques. The Survey System's optional Statistics Module includes the most common type, called the Pearson or product-moment correlation. The module also includes a variation on this

type called partial correlation. The latter is useful when you want to look at the relationship between two variables while removing the effect of one or two other variables. Like all statistical techniques, correlation is only appropriate for certain kinds of data. Correlation works for quantifiable data in which numbers are meaningful, usually quantities of some sort. It cannot be used for purely categorical data, such as gender, brands purchased, or favorite color. Following are the techniques for determining the correlation :-

5.6.1 Rating Scales

Rating scales are a controversial middle case. The numbers in rating scales have meaning, but that meaning isn't very precise. They are not like quantities. With a quantity (such as dollars), the difference between 1 and 2 is exactly the same as between 2 and 3. With a rating scale, that isn't really the case. You can be sure that your respondents think a rating of 2 is between a rating of 1 and a rating of 3, but you cannot be sure they think it is exactly halfway between. This is especially true if you labeled the mid-points of your scale (you cannot assume "good" is exactly half way between "excellent" and "fair"). Most statisticians say you cannot use correlations with rating scales, because the mathematics of the technique assume the differences between numbers are exactly equal. Nevertheless, many survey researchers do use correlations with rating scales, because the results usually reflect the real world. Our own position is that you can use correlations with rating scales, but you should do so with care. When working with quantities, correlations

provide precise measurements. When working with rating scales, correlations provide general indications.

Calculating the Correlation

Now we're ready to compute the correlation value. The formula for the correlation is:

We use the symbol r to stand for the correlation. Through the magic of mathematics it turns out that r will always be between -1.0 and +1.0. if the correlation is negative, we have a negative relationship; if it's positive, the relationship is positive. You don't need to know how we came up with this formula unless you want to be a statistician. But you probably will need to know how the formula relates to real data -- how you can use the formula to compute the correlation. Let's look at the data we need for the formula. Here's the original data with the other necessary columns:

 Person Heigh Self x*y x*x y*y t (x) Esteem (y) 1 68 4.1 278.8 4624 16.81 2 71 4.6 326.6 5041 21.16 3 62 3.8 235.6 3844 14.44
 4 75 4.4 330 5625 19.36 5 58 3.2 185.6 3364 10.24 6 60 3.1