You are on page 1of 24

UNIT III: Basic Statistics for Analysis and Interpretation of Assessment data

----------------------------------------------------------------------------------------------------------------------------------------------------

Role and importance of statistics in analyzing of Central Tendency- Mean, Median, Mode -concept
assessment data, Population and Sample and methods of finding each measure
Data, Types of Data- Primary & Secondary, and when to use each measure.
Quantitative & Qualitative Measures of Variability/Dispersion- Range, Mean
Classification of Data, Frequency Table Deviation, Quartile Deviation, Standard
(Grouped & Ungrouped) Deviation-concepts and methods of finding
Graphical Representation of Data- need and each measure and When to use each measure.
importance, Representing data using Bar Correlation-meaning and importance,
Diagram and Pie Diagram, Histogram, Concept of Coefficient of correlation, Types
Frequency Polygon, Frequency Curve and of Correlation- Positive, Negative, Zero and
Ogives, Interpretation of graphical Perfect Correlation, Rank Difference Method
representations. of calculating Coefficient of correlation,
Descriptive Statistical Measures : Measures interpretation of correlation.
----------------------------------------------------------------------------------------------------------------------------------------------
The word statistics derived from the Latin word ‘Status’ which means a ‘Political State’. It was applied only to
such facts and figures as the state required for its official purpose. Statistics is a body of methods for making
wise decisions in the face of uncertainty. It embodies a methodology of collection, classification, description
and interpretation of data obtained through the conduct of surveys and experiments. In recent time statistics
has come to be used in two sense; as numerical data & as statistical method. The word statistics denotes
some numerical data. In this case it has numerical description of quantitative aspect of things. They take the
form of counts or measurements. Statistical refers to the principles and methods used in collection, analysis
and interpretation of data.
Definition of statistics
“Statistics may be called the science of counting” A L Bowley
“Statistics can be defined as the collection, presentation and interpretation of numerical data” Croxton and
crowed
Statistics as a subject or branch of knowledge is defined as one of the subjects of study that helps us in
the scientific collection , presentation , analysis and interpretation of numerical facts.
“aggregates of facts to a marked extend by multiplicity of causes numerically expressed, enumerated or
estimated according to reasonable standards of accuracy, collected in a systematic manner for a pre
determined purpose and placed in relation to each other” HORACE SECRIST
The term statistics is used as a plural noun as well as a singular noun. In plural form it refers to the
numerical data collected in a systematic manner with some definite aim or object in view. In singular
sense The technique and methods used in collection , analyses and interpretation of data.

Characteristics
❖ Aggregate of facts
❖ Numerically expressed
❖ Affected to a marked extend by multiplicity of causes and not by a single cause
❖ Collected in a systematic manner
❖ Collected for a predetermined purpose
❖ It should be placed in relation to each other
❖ The reasonable standard of accuracy should be maintained in statistics
Functions (Steps of statistical analysis)
➢ Collection
➢ Classification
➢ Tabulation
➢ Analysis
➢ Interpretation
➢ Comparison

Importance of statistics
1. Statistics in business- statistics in extremely used in modern activities in business. A businessman must
make a proper analysis past, record to forecast the future business conditions. Every businessman have to
make use the statistical tools to estimate the trend of prices and of economic activities.
2. Statistics and the state- Statistics are the eyes of state as they help in administration. State conducts
the population census to estimate the figures of National Income and prosperity of the country.
3. Statistics in economic planning -In India various plans that have been prepared or implemented.
National Sample Survey Scheme was introduced to collect the statistical data for the use of planning.
4. Importance in defense and war- Statistical Tools are very useful in the field of defense and war because
it helps to compare the military strength of different countries in terms of manpower, tanks, war aero-
plains, missile etc. It also helps in planning future military strategy of the country. It helps to estimate the
loss due war. It helps to arrange the war finance.
5. Importance in research -In the field of industry and commerce researches are made to find out the
causes of variations of different products.
6. Importance in physical science.- In the sphere of physical science like physics, chemistry, Botany etc a
large number of measurements are taken which are found to vary from actual results.
7. Statistical method are vital in all educational problems.- Books dealing with educational science,
educational articles in magazines, educational surveys are repeat with statistics. If teacher wants to learn
these matters he must have a familiarity with statistical terminology.
8. Statistics in mathematics.-The accuracy of conclusion based on statistical methods can be easily tested
and verified.
9. Study or comparison of group of individuals.-It is not possible to squeeze out general conclusion merely
by examining a set of large number of individuals scores. In such a case certain representative values or
norms have to be calculated.
Other uses of Statistics are
i.Statistics has developed powerful tools which enable as to make valid inferences regarding
characteristics of a population by studying only a representative part of it, called a sample.
ii.Huge amount of quantitative information may be collected in reasonable time at minimum expenses
with the desired degree of accuracy using statistical method.
iii.For a physician to test the effectiveness of a new drug.
iv.For a political commentator of a country in a future date.
v.For a sociologist to forecast the population of a country in a future date.
vi.To enable the investigator find ratios, proportions etc.

Population A population is the aggregate of all the units under study in any field of enquiry. It is a
collection of individuals or of their values which can be numerically specified. It is also called as
universe. A population can be Finite population or Infinite population
Sample A finite subset of a population, selected from it with the objective of investigating its
properties is called a sample of that population. A sample is selected in such a manner that it represents
the population. It is a minute model or replica of the population. The representative proportion of the
population is called a sample. The sample must have sufficient size to warrant statistical analysis.
Sampling Sampling is the process by which a relatively small number of individuals or measures of
individuals, objects or events is selected and analyzed in order to find out something about the entire
population from which it was selected.
It helps to reduce expenditure, save time and energy, permit measurement of great scope, or produce
greater precision and accuracy. Sampling procedures provide generalizations on the basis of a relatively
small proportion of the population
Methods of sampling
Probability sampling Or Random sampling It is based on the probability for selection of each item.
Also known as chance sampling.
Non-probability sampling It is that sampling which does not afford any basis for estimating the
probability for each item to be included in the sample.
Differences between population and sample
population sample

Population refers to the collection of all elements Sample means a subgroup of the members of
possessing common characteristics, that comprises population chosen for participation in the study.
universe.
The target population is the total group of individuals from A sample is the group of people who take part in the
which thesample might be drawn. investigation. The people who take part are referred to as
“participants”
Population is always a large group a part of the population so comparatively smaller
Includes Each and every unit of the group. Includes Only a handful of units of population.
Data collection utilizes Complete enumeration or Data collection utilizes Sample survey or sampling
census
Focus on Identifying the characteristics. Focus on Making inferences about population.

Characteristic is called Parameter Characteristic is called Statistic


Difficult to study population Easier to study sample

DATA AND TYPES Of DATA Statistics is the study of the collection, organization, analysis, interpretation and
presentation of data. The first step in statistical work is to obtain data. Data constitute the foundation of
statistical analysis and interpretation.
 Data denotes raw facts and figures. Data can be defined as a collection of facts or information from
which conclusion may be drawn.
Selection Of Appropriate method For Collection Of Data
 Nature and scope of enquiry
 Availability of financial resources
 Availability of time and money
 Degree of accuracy desired
 Status of the investigator
 Education and level of the respondents
Classification of data
On the basis of who collect data ,data can be classified into two
 Primary data - Primary data are those data which are collected for the first time and are original in
character. Primary data are in the shape of raw materials from which the investigator draws
conclusions by applying statistical methods for analysis and interpretation.
“By primary data we mean those data which are original,that is those in which little or no grouping has
been made, the instance being recorded or itemized as encountered.they are essentially raw materials”
-HORACE SECRIST
ADVANTAGES OF PRIMARY DATA
 They are the first hand information
 The data collected are reliable as they are collected by the investigator for himeself
 The primary data are useful for knowing opinion ,qualities and attitudes of respondents
DISADVANTAGES OF PRIMARY DATA
 Expensive and time consuming
 Scope for personal bias
 Selection of a representative sample is not an easy task
Methods Used For Collecting Primary Data
• Observation method
• Interview method
• Questionnaire method
• Schedule method
Secondary data-Secondary data are those which have been collected by some other person for his purpose
and published, They are in the shape of finished products. “secondary data are those already in existence
and which have been collected for some other purpose than answering of the question of hand”-
M.M.BLAIR
Advantages Of Secondary Data
 The information can be collected by incurring least cost
 The time required for obtaining the information is very less
 Available at large quantity of data
 It helps the researcher to defining the problem and formulating hypothesis
 It helps in interpreting the primary data with more insight
Disadvantages Of Secondary Data
 Inappropriate and inadequate
 Inaccurate and unreliable
 The secondary data may contain certain errors
Sources of secondary data
• external -personal and public
• internal
• Official reports of central ,state and local govt.
• Official publication of the foreign govt.and international bodies like UNO and its subordinate bodies
• reports and publication of trade association, banks,cooperative societiesand similar semi govt.and
autonomous organisations
• Publications of research organisations, centres, institutes,and reports submitted by
economists,research scholars etc.
• Technical journals,news papers, books periodicals etc.
Difference Between Primary And Secondary Data
Primary data Secondary data

Primary data are original in character Secondary data are not original

Primary data are in the form of raw mateial Secondary data are in the form of finished
product

The collection of primary data require large Secondary data are easily available from
sum,energy,and time secondary sources

Primary data after use becomes secondary Secondary data can’t be converted into
data primary data after its use

Primary data Secondary data

Precautions are not necessary in the use of Precautions are necessary in the use of
primary data secondary data

It can be collected by different method via It can be collected by copying down from
observation,interview,questionnire,and published and unpublished sour
schedule method

During the process of assessment or research a large amount of information is gathered which can be
either qualitative or quantitative. On the basis of measurement ,data can be classified into two
Qualitative data - Qualitative data is a categorical measurement expressed not in terms of numbers,
but rather by means of a natural language description. When a person collects data in qualitative terms
the assessment is called qualitative. Qualitative observations are defined as any observation made using
the five senses. Because people often reach different interpretations when using only their senses,
qualitative evaluation becomes harder to reproduce with accuracy; two individuals collecting data
regarding the same thing may end up with different or conflicting results. In research and business,
qualitative data may involve value judgments and emotional responses. A similar example of a
qualitative data is "Our Company created more visually compelling projects last year than this year."
Qualitative data is more concerned with detailed descriptions of situations or performance; therefore it
can be much more subjective but can also be much more valuable in the hands of an experienced
person. The method of qualitative data collection rely on descriptions rather than numbers. It collects
data that are not analyzed by quantitative methods but rather by interpretive criteria. Here informal
methods like observation, interview, field notes, diary, document collection, anecdotes etc are used.
Examples: Description of procedure or skill demonstrated by student (based on observation),
Feedback on a demonstration or skill test, on case study or written assignment etc.
Quantitative data - Quantitative data is a numerical measurement. Expressed not by means of a
natural language description, but rather in terms of numbers. When the person collects data in
quantitative terms the assessment is called quantitative. Quantitative observations are made using
scientific tools and measurements. The results can be measured or counted, and any other person trying
to quantitatively assess the same situation should end up with the same results. An example of a
quantitative evaluation would be "This year our company had a total of 12 clients and completed 36
different projects for a total of three projects per client." Includes methods that rely on numerical
scores or ratings and collected data can be analyzed using quantitative methods. In quantitative data, the
process involves the collection, analyzes and interpretation of data is in terms of numbers. A
quantitative data collection uses values from an instrument based on a standardized system where the
data collected is limited to a selected or predetermined set of possible responses. In this data is
collected using more formal methods like tests, questionnaires, inventories, rating scale etc. Examples
:Number correct responses on a test, Ratings on an end-of-term course evaluation, Number of steps
missed during a skill or procedure demonstration.
Quantitative data Qualitative data
Collection and analysis of data in quantitative Collection and analysis of data in qualitative
terms. Data collected and analyzed in terms of terms. Data collected and analyzed in terms of
numbers (numerical data).Raw data are numbers descriptions (narrative data). Raw data are words
More objective in nature More subjective in nature
Uses numerical score or rating Uses detailed descriptions of situations or
performance
Can be considered as an analytical approach Can be considered as a holistic approach
Uses more structured and well constructed Method of data collection is mostly unstructured
methods of data collection (formal and rigid) (informal and flexible)
Response freedom is limited More freedom of response
Objective scoring Judgmental scoring
Easier analysis possible and can arrive at group Difficult to analyze and arrive at generalizations
generalizations
Gives insight into the child's cognitive, affective Gives insight into the other behavioural
and skills. characteristics
A person should utilize both qualitative and quantitative assessment for the complete evaluation of the
pupil. Both quantitative and qualitative data have their benefits, though one is usually more appropriate
than the other in any given situation. Both are supplementary to each other. A student's score in an
attitude scale can be justified by collecting data by observation.

CLASSIFICATION of data It is a technique with the help of which the collected data are divided into various
groups etc. It helps To reduce the complexities of the data., To facilitate the understanding., To facilitate the
comparison., To analysis and interpretation.
Classification of data should be
1 clearly understood.
2 It should be stable.
3 It should be flexible.
4 It should be clearly defined.
5 Quality or attributes should be expressed quantitatively.
TYPES OF CLASSIFICATION
• Geographical {population distribution}
• Qualitative {sex ,Color, literacy etc.}
• quantitative {Hight,Weight,Mark,Income}
• chronological {Time period}
TABULATION OF DATA - “Tabulation is a process of an orderly arrangement of data in
columns and rows”. -BLAIR.
Tabulation of data is done for Systematic presentation of statistical data., Classification of problem in
brief and simplicity, Facilitating the interpretation., To present the data in the form of Graph, Chart,
Diagram etc., To help comparison study.

FREQUENCY DISTRIBUTION
Frequency distribution is an arrangement of the values that one or more variables take in sample. Each entry
in the table contains the frequency. A frequency distribution has minimum of 2 coloumns. The leftmost one
listing the variable found in the data and the next is giving the frequency for that value.
TYPES OF FREQUENCY DISTRIBUTION
GROUPED FREQUENCY DISTRIBUTION- When there is a large number of scores, It is useful to group them
into a Manageable number of intervals by Creating intervals of equal widths and Computing the frequency
of fall into Each interval. Such a distribution is Called grouped frequency distribution.

UNGROUPED FREQUENCY DISTRIBUTION -If the number of distinct values it takes is Small,classification can
be done by Preparing a table which has no classes And gives only the frequency of each Value.Such a table
is called an Ungrouped frequency distribution.

CONSTRUCTION OF A FREQUENCY DISTRIBUTION TABLE


• First decide the number of classes to include.
• Find the class width.
• Find the classlimits
• Make tally mark for each entry
• Count the tally mark to find the total frequency for each class.
ADVANTAGES
• Data become comprehensive by arranging it as frequency distribution.one can understand easily.
• It makes comparisons easier.
• Raw data cannot be represented in graphical form.To prepare graphs frequency distribution is needed.
• It attracts the attention of even a layman and gives him an insight into the nature of the observation.
• It helps further statistical analysis of the data.

DISADVANTAGES
• If the frequency distribution is grouped,the identity of the observation is lost.
• The selection of the class interval and lower bound of the first class are to a certain extent arbitrary.so
different frequency tables into which the same data is classified may give contradictory impressions.
GRAPHICAL REPRESENTATION OF DATA
Graphical representation of data means the pictorial representation and manipulation of data. Graphic
representation is the geometrical image of a set of data. It is a mathematical picture. It enables us to think
about a statistical problem in visual terms. It is a creative process that combines art and technology to
communicate idea. Different types of graphs are used in data representation. The graphic representation of
data proves quite an effective and an economic device for the presentation, understanding and interpretation
of the collected statistical data. Complicated data through a diagram or graph can easily be understood.
Some of them are listed below:-
For ungrouped data or discrete data
• Line graph
• Bar graph
• Pie graph
• Pictogram
For grouped data
• Histogram
• Frequency Curve
• Frequency Polygon
• Ogive

MERITS OF GRAPHICAL REPRESENTATION OF DATA


• Data can be presented in a more attractive and an appealing form.
• It provides a more lasting effect on the brain.
• Comparative analysis and interpretations may be effectively and easily made.
• Valuable statistics like median, mode, quartiles may be easily computed.
• Such representation may helps in the proper estimation, evaluation and interpretation of the
characteristics of items and individuals.
• It carries a lot of communication power.
• Graphical representation helps in forecasting, as it indicates the trend of the data in the past.
• Acceptability
• Easy to remember
• Facilitates comparison
• Easy to understand
• A complete data
• Use in the notice board
• Less errors and mistakes
• Saves considerable time
• Self explanatory
• Helpful even for less literate audience
DEMERITS OF GRAPHICAL REPRESENTATION OF DATA
• Lack of accuracy
• Subjective
• Misleading conclusions
• Presents only the approximate values
• Presents only the limited amount of information
• Can be confusing with the increase in no. of variables
• Not helpful in analyzing the data.
• Once the graph is constructed the identity of each observation is lost.
RULES FOR THE CONSTRUCTION OF GRAPH
• Every graph must have a suitable title.
• The graph must suit to the size of the paper.
• Footnotes should be given at the bottom to illustrate the main points about the graph.
• Graph should be as simple as possible.
• In order to show many items in a graph, index for identification should be given.
• A graph should be neat and clean.
• Every graph should be given with a table to ensure whether the data has been presented
accurately or not.
• The test of a good graph depends on the ease with which the observer can interpret it. Thus
economy in cost and energy should be exercised in drawing the graph.

BAR DIAGRAM
➢ A bar diagram or a bar graph displays data visually and is sometimes called a bar chart.
➢ Data is displayed either horizontally or vertically.
➢ Displays all kinds of information.
➢ Helps to make generalization and conclusion more quickly and easily.
➢ Bar graph will have a label, axis scales, and bars.
TYPES OF BAR DIAGRAM
a) Simple bar diagrams. Horizontal or vertical bars with the same width drawn with their bases on the
same horizontal or vertical line with equal gaps in between and lengths proportional to the magnitude
of the observations.

b) Subdivided bar diagrams (Component Bar Chart). First a simple bar diagram is drawn with the lengths
of the bars proportional to the totals of the component parts and is subdivided into parts of length
proportional to the component magnitude and each part is given a different color or shading. Used
when the observations have different components and when a comparison of the component parts are
needed.

c) Percentage bar diagrams. This is the modification of the sub divided bar diagram. Here the component
parts are expressed as the percentages of the total and a component bar diagram is drawn with all bars
having equal length.

d) Multiple bar diagrams. Grouped bars are used to represent related sets of data. For
example, imports and exports of a country together are shown in multiple bar chart. Each bar in a
group is shaded or coloured differently for the sake of distinction. Used for representing two or more
interrelated data for facilitating comparison.

e) Deviation bar diagrams. Used to represent net quantities like net profit, balance payable, deficit, etc.
Base line is drawn in the middle of the paper horizontally and positive values are indicated by bars of
proportional length drawn above the horizontal line and negative by bars of proportional length drawn
below the horizontal line.
PIE DIAGRAM
• Pie diagrams or pie charts are circle drawn to represent statistical data. The data is represented
through the sections or portions of a circle. It brings out the relative importance of the various
components. For drawing a pie diagram, we construct a circle of any diameter and this is broken into
various segments. Angle 360 degree represent 100percent and the corresponding angles for each
component can be found by multiplying 360 degree with percentage of the component

HISTOGRAM
A Histogram is a graphical display of frequency distribution. The term Histogram was just termed by ‘Karl
Pearson’ in 1895 as a term for a common form of graphic representation. A histogram is a graphic
representation of a continuous frequency distribution through special kind of vertical bar charts. There are no
gaps between the bars. The scale on the x axis must be continuous, the upper boundary of one class coinciding
with the lower boundary of next class. In the histogram, the class intervals should be in the exclusive form. If
the class intervals are in the inclusive form then it should be converted into exclusive form.
FREQUENCY POLYGON A frequency polygon is a graph of frequency distribution. It is an improvement over the
histogram. It is constructed either after drawing a histogram or without drawing a histogram. In the frequency
polygon, midpoints of all the class intervals are taken and frequencies corresponding to the midpoints are
marked. The points of frequencies are joined through straight lines to get frequency polygon.

FREQUENCY CURVE A continuous frequency distribution represented by a smoothed curve is known as


frequency curve The midpoints of classes are taken along the x axis, and frequencies along the y axis. The
points thus plotted are joined by a free-hand smooth curve.
OGIVE - Ogives are graphs of cumulative frequency distribution drawn on natural scale to determine the values
of certain factors like median, quartiles, deciles etc. Class limits are shown along the x axis and the cumulative
frequencies along the y axis. There are two types of ogives:

LESS THAN OGIVE - in less than ogive we start with the upper limits of the classes and go on adding the
frequencies. When these frequencies are plotted, we get a rising curve.

MORE THAN OGIVE- in more than ogive we start with the lower limit of the classes and from the total
frequencies we subtract the frequency of each class. When these frequencies are plotted we get a declining
curve.
Measurers of central tendency: For a given set of large data we usually find that there will be very
few persons with very high and very low scores. Most of the person’s scores would lie in between the
highest and the lowest scores. This tendency of the distribution to cluster around the middle value is
called central tendency and the typical score around which most of the scores cluster or the value
between the extreme scores that is shared by most of the persons is referred to as measure of central
tendency. It is a measurement of data that indicates where the middle of the information lies. Tate
(1955) defines a Measure of Central Tendency as “a sort of average or typical value of the items in the
series and its function is to summarize the series in terms of this average value.” There are
three common measures of central tendency including the Arithmetic mean or mean, the median,
and the mode.
Some of the common uses of a measure of central tendency are
 Each of them is a representative characteristic of the whole group. The performance of the
group as a whole can be described by a measure of central tendency, in its own way.
 They help in the comparison of two or more groups and samples in terms of their typical
performance.
 They indicate where the center of the distribution tends to be located.
 They tells us about the shape and nature of the distribution (for normal distribution mean=
mode=median).
 They give us a concise picture of large data.
 They give a general picture of the whole group by use of the sample data alone.
 To find the mathematical relationship between different groups.

Characteristics of a good average


 Should be stable, reliable and an accurate measure.
 Should be a representative value of the distribution.
 Its meaning and definition is easily understood and easy to calculate.
 Should be capable of further algebraic treatment.
 Should not be affected by fluctuations in sampling.
 Should be used for further statistical analysis.
 Should depend on all values of the distribution.
 Should not be affected by extreme items.
Mean: The score located at the mathematical center of a distribution is called the mean or arithmetic
mean. Mean is the most common and useful measure of central tendency. It is simply the sum of the
numbers divided by the number of scores in a set of data. This is also known as average. It is
represented by the symbol M or x. it is calculated using the formula
𝑿
𝑴= 𝑵 for ungrouped data
And Where, X is the individual score,
𝒇𝑿
𝑴= 𝑵 for grouped data f is the individual frequency
M is the mean,
N is the number of observations or 𝑁 = 𝑓
𝒇𝒅
Formula for short-cut method 𝑴 = 𝑨 + 𝒇 𝒊 , Where A is the assumed mean
𝑋−𝐴
𝑑= , i-class width
𝑖
When mean is used
 Mean is the most stable, reliable and accurate measure of central tendency which is not affected
by fluctuations in sampling. So when such a measure is needed we compute mean.
 Used to summarize interval or ratio data in situations when the distribution is symmetrical and
unimodal.
 When we need to compute further statistics we use mean as mean is capable of further algebraic
treatment.
 Avoid the usage of mean when extreme items seriously affect the average score.
Advantages of Mean
 Most widely and commonly used measure of central tendency.
 Most stable and accurate measure of central tendency.
 Its meaning and definition is easily understood.
 It best conveys the idea of mean or average value.
 It can be located with any arrangement of the scores.
 It is derived using the exact scores of the items in the series.
 It gives equal weightage to every item in the series whether extreme or not
 It is capable of further algebraic treatment and is used for further statistical analysis.
 It is not affected much by fluctuations in sampling.
 Only one score can be mean.
 Total score, Combined mean can be obtained if the mean of constituent units is known.
Limitations of Mean
 It cannot be easily located from a graph.
 It is affected by the value of each item, so extreme values easily affect it.
 It is most suitable only when the distribution is normal and not skewed.
 It cannot be calculated for open ended classes.
 It cannot be used in the case of nominal and ordinal data
Median: Median is the number present in the middle when the numbers in a set of data are arranged in
ascending or descending order. If the number of scores in a data set is odd then median is the middle
value and if even, then the median is the mean of the two middle numbers. We can also say that median
is the point on the score scale or distribution below and above which half (50%) of the scores fall i.e.,
the score at the 50th percentile, (in the middle). It is the score that divides the distribution into two equal
parts. Note that the central item is not the median but the value of the central item is the median.
Median is also known as the middle quartile or the second quartile (Q 2)
Computing Median for ungrouped data:
Arrange the items in ascending or descending order.
When N is odd, the value of the [(N+1)/2]th item will be the median.
When N is even Median is the average of the value of the (N/2) th item and value of the
[(N/2)+1]th item.
Computing Median for grouped data: First write the true class limits and write the cumulative
frequencies of the classes. Then locate the median class. This is done in the same manner as in the case
of the ungrouped data. Then Median for grouped data is calculated using the formula
𝑵
( 𝟐 )−𝑭
𝑴𝒅 = 𝒍 + [ ]𝒊
𝒇

Where
l - Exact lower limit of the Median class
F – Cumulative frequency up to or above the median class
f – Frequency of the median class
i – Class interval
N – Total frequency( 𝑁 = 𝑓)
When to use median
 Used to summarize ordinal or highly skewed interval or ratio scores
 When we have to get the exact mid-point of the distribution median is computed.
 When a series contains extreme measures median is a more representative measure than mean.
 In the case of open ended distributions computation of mean is impossible so median is more
reliable.
 When we have to calculate a measure of central tendency from a graph median is the most
suitable.
 Median is used specifically for those quantities like health, honesty, intelligence etc. that cannot
be measured in quantities.
Advantages of Median
 It is easily understood and determined and located with greater exactness than mode.
 Median is a better measure of central tendency than mode.
 Only one score can be the median.
 It is the most representative measure of central tendency when the distribution contains extreme
scores.
 It is useful in the case of open ended classes and skewed distributions.
 It will always be around where the most scores are.
 It can be calculated even if a value is missing if its relative position is known.
 It can be computed from a graph.
Limitations of Median
 It is a non-algebraic measure. We cannot calculate the total score or the combined median etc.
 It is a less dependable measure of central tendency than mean.
 It is not used in higher statistical analysis.
 It cannot be used in the case of nominal data.
Mode: Mode is the value that occurs most frequently in a set of data. It is typically useful in describing
the central value when the scores reflect a nominal scale of measurement. It is the point on the scale
that corresponds to the maximum frequency of the distribution. In any series it is the value of the item
which is most characteristic or common and is usually repeated the maximum number of times.
For ungrouped data mode is the value repeating most or with highest frequency.
For grouped data mode is calculated using the formula
𝒇𝒑 𝒇𝒎 − 𝒇p
𝑴𝒐 = 𝒍 + [ 𝒇𝒑 + 𝒇𝒔 ] 𝒊 or 𝑴 = 𝒍 + [ 𝟐𝒇 𝒎− 𝒇p− 𝒇𝒔 ] 𝒊
Where, l - Exact lower limit of the Model class (the class in which mode lies i.e., the class
corresponding to the highest frequency)
fm- Frequency of the modal class
fp – Frequency of the class preceding the modal class (above the modal class)
f s– Frequency of the class succeding the modal class (below the modal class)
i – Class interval
When to use mode
 In nominal data – Since we cannot use mean or median
 Also in ordinal, interval or ratio data, along with mean and median
 When a quick and approximate measure is to be determined, we compute mode.
 Mode is a very useful measure in the manufacturing industry as the most sold item i.e., modal
value is given more priority.
 When a histogram or frequency polygon is given, the measure that can be easily computed is
mode.
 When we wish to know the most typical case.
Advantages of Mode
 It is easily understood even by a common man.
 Mode can be easily be computed merely by looking at the data. All that one has to do is to find
out the score which is repeated maximum number of times.
 It is an average widely used in everyday life. When we speak of average we generally refer to
mode e.g., average shoe size refers to that which is most sold.
 It is useful in situations in which it is desirable to eliminate extreme cases.
 It encourages attention to bimodal and multimodal distribution.
 It can be computed from a graph.

Limitations of Mode
 It is the most unstable measure of central tendency.
 It not at all reliable in small samples. E.g., it the model salary of 50 workers is Rs.500 per
month but 45 out of them gets different salaries the mode is very unreal and gives a false
picture.
 It is incapable of further algebraic treatment
 A distribution can have more than one mode.
 It is not used in higher statistical analysis.

Relationship between Mean, Median and Mode


Mode = 3Median – 2Mean
Measures of variability or dispersion
There is a tendency for the scores to be dispersed, scattered or show variability around the average.
Thus the tendency of the attributes of a group to deviate from the average or central value is known as
dispersion or variability and the expected range of dispersion or variation above or below the average
or central value for a given data is called the measure of variability.
A measure of variability is a single value that gives us the degree of variability or dispersion i.e., the
scatter or spread of the individual scores throughout the distribution or given data.
When comparing two or more groups the measures of central tendency merely gives us an idea of the
general characteristics of the groups as a whole. They do not show how the individual scores are spread
out as a whole. Only with the value of the measure of central tendency we are unable to know how the
scores are distributed in the group.
E.g., Two group’s scores in a test are such as the following
Group A: 40,38,36,27,20,29,28,3,5,4 and Group B: 19,20,22,18,21,23,17, 20,22, 18
The mean value in both cases is 20, so far as mean is considered there is no difference in the
performance of the two groups. But by the observation of the scores of the two groups we find that the
scores of Group A have a wide range while of Group B have a small range. The scores in the latter
group are less variable than those in the former. So the performance of Group A and Group B cannot be
considered as the same.
Therefore measures of central tendency alone provide insufficient base for the comparison of two or
more groups. For better comparison of the groups we need to pay attention to the variability or
dispersion of the set of scores in the set of scores. Here lies the importance of measures of central
variability or dispersion.

The are chiefly four measures of variability or dispersion


1. Range (R)
2. Quartile Deviation (Q)
3. Average Deviation (AD)
4. Standard Deviation (SD)

Range (R)
Range is the simplest measure of variability or dispersion. It is calculated by subtracting the lowest
score from the highest score in the series or data. It takes only extreme scores into consideration and
ignores the variation of individual items.
Range = Highest value – Lowest value
The computation of range is recommended when
 We need to know simply the highest and lowest scores of the total spread.
 The group or distribution is too small
 We want to know the variability within the group with no time.
 We require speed and ease in the computation of a measure of variability.
 The distribution of the scores of the group is such that the computation of other measure of
variability is not much useful.
Merits of range
 It is very easily determined and understood.
 It is very useful as a supplementary measure. In addition to other measures it helps in the
description of data.
 It is a moderately reliable measure in large unimodal samples.
 It is a very simple measure of variability.
Demerits of Range
 It is not a representative measure of variability.
 It is based on only two extreme scores and tells nothing about the variation among other
intermediate scores.

Quartile Deviation (Q)


Quartile deviation is the half of the inter-quartile range and also known as semi inter-quartile range. It
is computed using the formula
(𝐐𝟑 – 𝐐𝟏)
Quartile deviation, 𝐐 = where Q1 – 1st Quartile and Q3 – 3rd Quartile
𝟐

To find Q1 and Q3:


For ungrouped data Q1 and Q3 is found by first arranging the items in Ascending or Descending order
and then value of the N/4th and 3N/4th item gives Q1 and Q3.
For grouped data first write the true class limits and write the cumulative frequencies of the classes.
Then locate the class corresponding to Q1 and Q2. This is done by finding the class corresponding to the
N/4th and 3N/4th item of the distribution. Then Q1 and Q3 are calculated using the formula 𝑸=
𝑵 𝟑𝑵
( 𝟒 )−𝑭 ( )−𝑭
𝒍+[ ]𝒊 And 𝑸 =𝒍+ [ 𝟒𝒇 ] 𝒊
𝒇

Where
l1 - Exact lower limit of the Q1 class, l3 - Exact lower limit of the Q3 class
F1 – Cumulative frequency upto or above the Q1 class
F3 – Cumulative frequency upto or above the Q3 class
f1 – Frequency of the Q1 class, f3 – Frequency of the Q3 class
i – Class interval, N – Total frequency( 𝑁 = 𝑓)
The use of this measure is recommended when
 The distribution is skewed, containing a few very extreme scores.
 The measure of tendency is available in the form of median.
 The distribution is truncated (irregular) or has some indeterminate end values.
 We have to determine the concentration around the middle 50 per cent of the cases
 The various percentiles and quartiles have been already computed.
Merits
 It is more representative than the range as it is not dependent on the extreme values.
 It is very easy to compute, to understand and to interpret.
 It is the most useful measure of variability in which median is used.
 It is applicable even in that frequency distribution which have unequal class-intervals.
 It is quite useful in small samples and when there are extreme measures in the distribution.
Demerits
 25% of the scores fall below Q1 and 25% above Q3. Therefore Q1 and Q3 are measures of only
50% of the scores.
 It is a non-algebraic property and so less reliable than SD.

Average Deviation (AD)


Average Deviation (AD) as the mean of deviations of the acores in the series taken from their mean
(occasionally from median or mode). It is a simple measure that takes into account the fluctuation or
variation of all the items in the series. It is calculated using the formula
 𝑿−𝑴 𝒇 𝑿−𝑴
𝑨𝑫 = for ungrouped data And 𝑨𝑫 = for grouped data
𝑵 𝑵
Where, X is the individual score, f is the individual frequency
M is the mean,
N is the number of observations or 𝑁 = 𝑓
X - M signifies that in the deviation values we ignore the algebraic signs +ve or –ve. The ignoring
of algebraic signs constitutes a major weak point for this type of measure of variability.
This measure is used when
 The deviations of the scores are normal or near normal.
 The SD is unduly influenced by the presence of extreme deviations.
 It is needed to weigh all deviations from the mean according to their size
 A less reliable measure of variability can be employed.
Merits
 It is a very simple measure of variability and is easily understood.
 It is also meaningful, even to common man.
 It is based on all items and takes into account the fluctuations of all items.
 It is reliable even for a small sample.

Demerits
 As it based on all items it may be inflated or depresses by a single extreme value which is very
high or very low.
 As the signs are discarded and only absolute values are taken it is not an algebraic measure and
so cannot be reliably used in mathematical operations

Standard deviation (SD)


Standard deviation of a set of scores is defined as the square root of the average of the squares of the
deviations of each score from the mean. SD is regarded as the most reliable and stable measure of
variability as it employs the mean for its computation. It is often called as root mean square deviation
and is denoted by the Greek letter sigma (σ). The square of SD is known as variance of the
distribution.SD is regarded as the most stable and reliable measure of variability as it employs mean for
its computation and does not ignore algebraic signs.
The use of SD is recommended when
 We need a most reliable measure of variability.
 There is a need of computation of further statistics like the correlation coefficients, significance
of difference between means and the like.
 Measure of central tendency is available in the form of mean
 The distribution is normal or near to normal.
Merits
 It is the most reliable measure of variability and is useful for further statistical operations and in
making inferences.
 It is most useful in those cases when mean has been taken as a measure of central tendency.
 The greater the value of SD the more would scores scatter from the mean. SD can be shown on
distribution curves. 1SD distance includes 68.3% cases, 2SD distance 95.4% cases and so on.
 It is an algebraic measure and does not suffer from the mathematical fallacy of the MD in which
signs are disregarded.
 It can be reliably used most cases.
Demerits
 SD is not easily understood.
 It is sensitive to extreme values. (X-M)2
It is calculated using the formula
 (𝑿−𝑴)𝟐 𝑿 𝑿 𝟐 𝟐
𝑺𝑫 = √ 𝒐𝒓 𝑺𝑫 = √ −( ) for ungrouped data
𝑵 𝑵 𝑵
And
 𝒇(𝑿−𝑴)𝟐 𝒇𝑿 𝒇𝑿 𝟐 𝟐
𝑺𝑫 = √ 𝒐𝒓 𝑺𝑫 = √ 𝑵 − ( 𝑵 ) for grouped data.
𝑵
Where, X is the individual score, f is the individual frequency
M is the mean, N is the number of observations or 𝑁 = 𝑓

CORRELATION
In measures of central Tendency and Dispersion, our studies had been confined to one variable only. But we
often come across problems involving two or more variables, where items of one variable bears some relation
with the item of the other variable or influence the values of the other variable. For example rainfall and
agricultural yield, height and weight, age of husband and wife. The term correlation is used to indicate the
relationship between two such variables in which with changes in the values of one variable, the values of the
other variable also change. Thus, if with a change in the price of a commodity, the demand for that
commodity changes, we would say that the price and demand are related with each other. “A connection or
relationship between two or more things that is not caused by chance. “
Thus correlation analysis refers to the technique used in measuring the closeness of the relationship between
the variables.
L R CORNER “If two or more quantities vary in sympathy, so that movements in one tend to be accompanied by
corresponding movements in the other then they are said to be correlated “
A.M. Tuffle defined correlation “ an analysis of the co-variation of two or more variables”.

Importance of correlation
 Most of the variables show some kind of relationship. For instance, there is relationship between price
and supply, income and expenditure etc... With the help of correlation analysis we can measure in one
figure the degree of relationship.
 It helps to ascertain the traits and capabilities of pupils while giving guidance or counselling.
 Once we know variables are closely related, we can estimate the value of one variable given the value
of another. This is known with the help of regression.
 Correlation analysis contributes to the understanding of economic behaviour, aids in locating the
critically important variable on which others depend.
 Progressive development In the methods of science and philosophy has been characterized by increase
in the knowledge of relationship.
 The effect of correlation is to reduce the range of uncertainty. The prediction based on correlation
analysis is likely to be more variable and near to reality.
 Co-efficient of correlation is vital for all kinds of research work
 It helps in establishing validity or reliability of an evaluation tool.
TYPES OF CORRELATION
Simple, partial and multiple correlation
The distinction between simple, partial and multiple correlation is based on the number of variables studied.
 When the relationship between any two variables only is studied. It is a case of SIMPLE CORRELATION.
 When the relationship between any two out of three or more variables is studied ignoring the effect of
the other related variables, it is a case of PARTIAL CORRELATION.
 When the relationship between three or more variable is simultaneously, it is a case of MULTIPLE
CORRELATION.

Positive and negative correlation A correlation may be positive or negative depending upon the direction of
range of the variables.
POSITIVE CORRELATION is one where values of both the variables under study move in the same direction. The
data of positive correlation when plotted on a graph paper give an upward curve.

Increase in one variable → Increase in the other variable.


Decrease in one variable → Decrease in the other variable.
Example : (1) Demand and Production.
(2) Diameter and circumference of a circle.
NEGATIVE CORRELATION is one where both the variables under study move in the opposite direction. The
value of negative correlation if plotted on a graph paper give a downward curve.

Decrease in one variable → Increase in the other variable.


Increase in one variable → Decrease in the other variable.
Example : (1) Price and Demand.
(2) Speed and time.
PERFECT AND IMPERFECT CORRELATION
 When the values of both variables under study change at a constant ratio irrespective of the direction,
it is a case of PERFECT CORRELATION (ideal correlation). The graph plotted would be a perfect straight
line in the upward direction for perfect positive correlation and a perfect straight line in the downward
direction for a perfect positive correlation.
 When the value of the variable under study change at different radio it is a case of IMPERFECT
CORRELATION.
 When correlation are measured mathematically, the value of perfect correlation will be either 1 or -
1 and the value of imperfect correlation between 1 or -1.

Perfect positive correlation Perfect negative correlation


SCATTER DIAGRAM
Let (x1,y1), (x2,y2), …………. (xn, yn) be the set of observation obtained in a study of a population. In which two
characteristics are considered A diagram obtained by plotting points with co-ordinates (x1,y1), (x2,y2), ..…. (xn, yn)
in called a scatter diagram. It consists of ‘n’ points scattered over the x-y plane.

Positive Correlation Zero Correlation Negative Correlation


LINEAR CORRELATION
When the variation in the values of two variables are in a constant ratio correlation is said to be liner.
NON - LINEAR CORRELATION
In some cases, the ratio of the change in the two variables may not be constant and hence not be linear. Thus it
may be curve-linear or non linear.
COEFFICIENT OF CORRELATION The ratio indicating the degree of relationship between a pair of variables is
called coefficient of correlation. A correlation coefficient is statistical measure of the degree to which changes
to the value of one variable predict changes to the value of another. The statistical tool with the help of which
the relationship between two or more than two variables is studied is called measures of correlation. The
measures of correlation, called the correlation co-efficient summaries in one figure the direction and extend of
correlation.
Coefficient of correlation is calculated to study the extent or degree of correlation between two variables. It is
numerical index which explains between two variables. It is usually respected by the letter ‘r’.
 Correlation can be calculated as a number called the correlation coefficient .The coefficient correlation can
help identify what type of relationship the data sets have and how strong or weak the relationship is.
 Coefficient of correlation varies from -1 to 1.
 Positive coefficient correlation varies from zero to 1 and negative correlation coefficient varies from zero to
-1.
 Zero correlation indicates no consist relationship, and it is written as “ 0”

USES OF COEFFICIENT OF CORRELATION


 It may be used in determining the reliability of a test.
 One of the main purpose of use of correlation is prediction. Predictions are possible with the use of
regression equations.
 Partial and multiple correlation may be helpful in the analysis of relationship between various factors.
 Correlation is also useful in the calculation of validity.
ADVANTAGES OF COEFFICIENT OF CORRELATION
 Easy to work out and its easy to interpret.
 It not only gives an idea about the co-variation of the two series but also indicate the direct of
relationship.
 It gives a precise and quantitative figure which can be interpreted meaningfully.
 It can answer the validity of arguments for or against a statement.
DISADVANTAGES OF COEFFICIENT OF CORRELATION
 The value of correlation coefficient is unduly affected by extreme item.
 The coefficient of correlation may give a misleading picture of the extent of the relationship between
the variables if the data are not reasonably homogeneous.
 It is tedious to calculate.
 It assumes a linear relationship between the variables even though it may not be there.

Spearman’s rank correlation

Where D represents the difference in ranks and N represents number of observation.

INTERPRETATION OF CORRELATION
By interpretation we intend to point out how high is any given coefficient of correlation is. Any coefficient
of correlation that is not zero and that is also statistically significant denotes some degree of relationship
between the two variables. As regards the strength of relationship in between the two variables, the
coefficient of correlation does not give directly anything like percentage that is indicated by an ‘r’. The
coefficient of correlation is an index number, not a measurement on a linear scale of equal units. There is
no denying fact that correlation enable us to find out relationship between the two variables. The values of
r (correlation ) reflects the strength of relationship between the variables. The strength of relationship
between the two variables can be described roughly as under for various r’s:
 less than .20 slight, at most negligible relationship
 .20 to .40 low correlation
 .40 to .70 moderate correlation
 .70 to .90 high correlation
 .90 to 1.00 very high correlation.
 It may be noted that the relationship i.e., correlation may be either positive or negative
but in no case the value of correlation may exceed (the value of r more than ) plus/
minus 1.

You might also like