Professional Documents
Culture Documents
Unit 2 Fod
Unit 2 Fod
UNIT II
DESCRIBING DATA
COMPILED BY,
VERIFIED BY
Types of Data
Types of Variables
Describing Data with Tables and Graphs
Describing Data with Averages
Describing Variability
Normal Distributions and Standard (z) Score
2
LIST OF IMPORTANT QUESTIONS
UNIT - I
INTRODUCTION
PART – A
1. Write the three types of data.
2. What is Random sampling ?
3. What do you mean by Descriptive Statistics ?
4. Write the difference between Parameter and Statistic.
5. Write the Statistics and its types.
6. What is Descriptive Statistics ?
7. What is Inferential Statistics ?
8. What do you mean by Mean?
9. What do you mean by Median ?
10.What do you mean by Mode?
11.What do you mean by Variance?
12.What do you mean by Standard Deviation?
13.What do you mean by Range?
14.What is Probability distributions ?
15. What do you mean by Graphical representations?
PART – B
1. Illustrate the steps in constructing a frequency distribution and Construct
a frequency distribution for grouped data. Specify the real limits for the
lowest class interval in this frequency distribution.
3
2. The frequency distribution in the table shows the annual incomes in
dollars for a group of college graduates.
a. Construct a histogram.
b. Construct a frequency polygon.
c. Is this distribution balanced or lopsided?
INTRODUCTION
PART – A
5
4. Write the difference between Parameter and Statistic.
In our day in day out, we keep speaking about the Population and sample. So,
it is very important to know the terminology to represent the population and the
sample.A parameter is a number that describes the data from the population. And, a
statistic is a number that describes the data from a sample.
1. Descriptive Statistics
2. Inferential Statistics
7
15. What do you mean by Graphical representations?
have a bar plot, line plot, frequency plot, dot plot, boxplot, and the Normal Q-Q plot.We
16. Students in a theater arts appreciation class rated the classic film “The Wizard of
Oz” on a 10-point scale, ranging from 1 (poor) to 10 (excellent), as follows:
8
17. The IQ scores for a group of 35 high school dropouts are as follows:
9
18. Construct a bar graph for the data shown in the following table
10
20. What is in an interquartile range?
The interquartile range (IQR) contains the second and third quartiles, or the
middle half of our data set. Whereas the range gives us the spread of the whole
data set, the interquartile range gives us the range of the middle half of a data set.
11
25. What is the use of frequency polygon?
A frequency polygon is a type of line graph where a line segment curves to join the
midpoints of all the class intervals. The shape of the curved line helps in providing
accurate data. Both a line graph and frequency polygon graph are widely
used when data is required to be compared.
12
3. Round off to the nearest convenient interval (such as 1, 2, 3, . . 10,
particularly 5 or 10 or multiples of 5 or 10). In the present example, the
nearest convenient interval is 10.
4. Determine where the lowest class should begin.
(Ordinarily, this number should be a multiple of the class interval.) In the
present example, the smallest score is 133, and therefore the lowest class
should begin at 130, since 130 is a multiple of 10 (the class interval).
5. Determine where the lowest class should end
By adding the class interval to the lower boundary and then subtracting
one unit of measurement. In the present example, add 10 to 130 and then
subtract 1, the unit of measurement, to obtain 139—the number at which
the lowest class should end.
6. Working upward, list as many equivalent classes as are required to
include the largest observation.
In the present example, list 130–139, 140–149, . . . , 240–249, so that the
last class includes 245, the largest score.
7. Indicate with a tally the class in which each observation falls.
For example, the first score in Table 1.1, 160, produces a tally next to 160–
169; the next score, 193, produces a tally next to 190–199; and so on.
8. Replace the tally count for each class with a number—the frequency (f )
—and show the total of all frequencies.
(Tally marks are not usually shown in the final frequency distribution.)
9. Supply headings for both columns and a title for the table.
The IQ scores for a group of 35 high school dropouts are as follows:
13
Construct a frequency distribution for grouped data.
(b) Specify the real limits for the lowest class interval in this frequency
distribution.
14
(b) 64.5–69.5
The real limits are located at the midpoint of the gap between adjacent tabled
boundaries; that is, one-half of one unit of measurement below the lower tabled
boundary and one-half of one unit of measurement above the upper tabled
boundary. (65-0.5 = 64.5) & (69+0.5 = 69.5)
2. The frequency distribution in the table shows the annual incomes in dollars
for a group of college graduates.
a. Construct a histogram.
b. Construct a frequency polygon.
c. Is this distribution balanced or lopsided?
15
NOTE: Ordinarily, only either (a) a histogram, or (b) a frequency polygon
would be shown.
When closing the left flank of (b), imagine extending a line to the midpoint of
the first unoccupied class (–10,000 to –1) on the left, but stop the line at the
vertical axis, as shown.
(c) Lopsided.
16
4. Square each total from Step 3.
5. Add up the figures from Step 4.
17
The mean (average) of a data set is found by adding all numbers in the data set
and then dividing by the number of values in the set. The median is the middle value
when a data set is ordered from least to greatest. The mode is the number that occurs
most often in a data set.
The mode is the number that appears most often in the given data set.
Determine the mode for the following retirement ages: 60, 63, 45, 63, 65, 70, 55, 63,
60, 65, 63.
Mode=63
The owner of a new car conducts six gas mileage tests and obtains the following
results, expressed in miles per gallon: 26.3, 28.7, 27.4, 26.6, 27.4, and 26.9
Find the mode for these data.
Mode=27.4
Quartiles segment any distribution that’s ordered from low to high into four
equal parts. The interquartile range (IQR) contains the second and third
quartiles, or the middle half of our data set.
Whereas the range gives us the spread of the whole data set, the interquartile
range gives us the range of the middle half of a data set.
18
Calculate the interquartile range
The interquartile range is found by subtracting the Q1 value from the Q3 value:
Formula Explanation
Q1 is the value below which 25 percent of the distribution lies, while Q3 is the value
below which 75 percent of the distribution lies.
We can think of Q1 as the median of the first half and Q3 as the median of
the second half of the distribution. Methods for finding the interquartile
range
Although there’s only one formula, there are various different methods for identifying
the quartiles. We’ll get a different value for the interquartile range depending on the
method we use.
Here, we’ll discuss two of the most commonly used methods. These methods differ
based on how they use the median.
The procedure for finding the median is different depending on whether our data set is
odd- or even-numbered.
When we have an odd number of data points, the median is the value in the
middle of our data set. We can choose between the inclusive and exclusive
method.
With an even number of data points, there are two values in the middle, so the
median is their mean. It’s more common to use the exclusive method in this
case.
While there is little consensus on the best method for finding the interquartile range,
the exclusive interquartile range is always larger than the inclusive interquartile range.
The exclusive interquartile range may be more appropriate for large samples, while for
small samples, the inclusive interquartile range may be more representative because
it’s a narrower range.
19
Steps for the exclusive method
To see how the exclusive method works by hand, we’ll use two examples: one with an
even number of data points, and one with an odd number.
Step 2: Locate the median, and then separate the values below it from the
values above it.
With an even-numbered data set, the median is the mean of the two values in the
middle, so we simply divide our data set into two halves.
Q1 is the median of the first half and Q3 is the median of the second half. Since each
of these halves have an odd number of values, there is only one value in the middle of
each half.
20
Step 4: Calculate the interquartile range.
Step 2: Locate the median, and then separate the values below it from the
values above it.
In an odd-numbered data set, the median is the number in the middle of the list. The
median itself is excluded from both halves: one half contains all values below the
median, and the other contains all the values above it.
Almost all of the steps for the inclusive and exclusive method are identical. The
difference is in how the data set is separated into two halves.
The inclusive method is sometimes preferred for odd-numbered data sets because it
doesn’t ignore the median, a real value in this type of data set.
Step 2: Separate the list into two halves, and include the median in both halves.
The median is included as the highest value in the first half and the lowest value in the
second half.
22
Step 3: Find Q1 and Q3.
Q1 is the median of the first half and Q3 is the median of the second half. Since the
two halves each contain an even number of values, Q1 and Q3 are calculated as the
means of the middle values.
We can see from these examples that using the inclusive method gives us a smaller
IQR. With the same data set, the exclusive IQR is 24, and the inclusive IQR is 20.
23
Normal
Any distribution that approximates the normal shape in panel A of Figure can
be analysed with the aid of the welldocumented normal curve.
Bimodal
24
Positively Skewed
A lopsided distribution caused by a few extreme observations in the positive
direction (to the right of the majority of observations),
Negatively Skewed
25
7. Explain how to construct a frequency polygon from a histogram.
A frequency polygon is almost identical to a histogram, which is used to
compare sets of data or to display a cumulative frequency distribution. It uses a line
graph to represent quantitative data.
Statistics deals with the collection of data and information for a particular
purpose. The tabulation of each run for each ball in cricket gives the statistics of the
game. Tables, graphs, pie-charts, bar graphs, histograms, polygons etc. are used to
represent statistical data pictorially.
To draw frequency polygons, first we need to draw histogram and then follow the
below steps:
Step 1- Choose the class interval and mark the values on the horizontal axes
Step 2- Mark the mid value of each interval on the horizontal axes.
Example
In a batch of 400 students, the height of students is given in the following table.
Represent it through a frequency polygon.
26
Solution: Following steps are to be followed to construct a histogram from the given
data:
The heights are represented on the horizontal axes on a suitable scale as
shown.
The number of students is represented on the vertical axes on a suitable scale
as shown.
Now rectangular bars of widths equal to the class- size and the length of the
27