You are on page 1of 4

Discrete variable: Set of possible values either is finite or else can be listed in an finite sequence

Continuous variable: Continuous variables can take on an unlimited number of values between the
lowest and highest points of measurement

Frequency

Frequency distribution: Tabular summary of the data showing frequency or relative frequencies of items
in each of several non-overlapping classes.

Relative frequency:

Relative frequency of the value =No of times the value occurs(f)/Number of observations in the data
set(N)

R.F. = f/N

Histograms: A graphical display of data using bars of different heights


Constructing a Histogram for Discrete Data:

First make frequency distribution

A histogram is a plot that lets you discover, and show, the underlying frequency
distribution (shape) of a set of continuous data. This allows the inspection of the data for
its underlying distribution (e.g., normal distribution), outliers, skewness, etc. An example
of a histogram, and the raw data it was constructed from, is shown below:

36 25 38 46 55 68 72 55 36 38
67 45 22 48 91 46 52 61 58 55
How do you construct a histogram from a continuous variable?
To construct a histogram from a continuous variable you first need to split the data into
intervals, called variables. In the example above, age has been split into variables, with
each variable representing a 10-year period starting at 20 years. Each variable contains
the number of occurrences of scores in the data set that are contained within that
variable. For the above data set, the frequencies in each variable have been tabulated
along with the scores that contributed to the frequency in each variable (see below):

Variable Frequency Scores Included in Variable


20-30 2 25,22
30-40 4 36,38,36,38
40-50 4 46,45,48,46
50-60 5 55,55,52,58,55
60-70 3 68,67,61
70-80 1 72
80-90 0 -
90-100 1 91

Notice that, unlike a bar chart, there are no "gaps" between the bars (although some
bars might be "absent" reflecting no frequencies). This is because a histogram
represents a continuous data set, and as such, there are no gaps in the data (although
you will have to decide whether you round up or round down scores on the boundaries
of variables).

There is no hard and fast rule for number of classes still a reasonable rule of thumb is

No of classes = √No of observations

After determining frequencies and relative frequencies, Calculate the height of each
rectangle = Relative frequency of the class/ class width

Resulting rectangular heights are usually called densities and vertical scale is density
scale. It will give you correct picture when class width are equal.

Histograms when class widths are unequal


In a histogram, it is the area of the bar that indicates the frequency of occurrences for
each variable. This means that the height of the bar does not necessarily indicate how
many occurrences of scores there were within each individual variable.
Relative frequency= class width)(density

=(Rectangular width)(rectangular height)

= Rectangular area

It is the product of height multiplied by the width of the variable that indicates the
frequency of occurrences within that variable. One of the reasons that the height of the
bars is often incorrectly assessed as indicating frequency and not the area of the bar is
due to the fact that a lot of histograms often have equally spaced bars (variables), and
under these circumstances, the height of the variable does reflect the frequency.

Shapes of histogram

Unimodal: with single peak

Bimodal: with peak

Multimodal: with having more than two peak.

Histogram is symmetric

Positive skewed

Negative skewed

Unit 1.3

Measures of location

Mean /simple mean/arithmetic mean: (x1+ x2+ x3+ x4+ x5+……..+ xn)/N
Median: Middle value of the series

Ordering n observation from smallest to largest

((N+1)/2 )th item in odd series

Average of ((N/2 )th item and ((N+1)/2 )th item in even series

Trimmed Mean

A trimmed mean is a method of averaging that removes a small designated percentage


of the largest and smallest values before calculating the mean. After removing the
specified outlier observations, the trimmed mean is found using a standard arithmetic
averaging formula. The use of a trimmed mean helps eliminate the influence of outliers
or data points on the tails that may unfairly affect the traditional mean.

2.0 2.4 2.5 2.6 2.6 2.7 2.7 2.8 3.0 3.1 3.2 3.3
3.4 3.4 3.6 3.6 3.6 3.6 3.7 4.4 4.6 4.7 4.8 5.3

N= 26

Deleted items on each end=(Trimming percentage)(N)

Trimming percentage= (Number of items deleted on each end/N)/100

You might also like