Professional Documents
Culture Documents
Unit 1 Fod
Unit 1 Fod
UNIT II
DESCRIBING DATA
COMPILED BY,
VERIFIED BY
➢ Types of Data
➢ Types of Variables
➢ Describing Data with Tables and Graphs
➢ Describing Data with Averages
➢ Describing Variability
➢ Normal Distributions and Standard (z) Score
2
LIST OF IMPORTANT QUESTIONS
UNIT - I
INTRODUCTION
PART – A
1. Write the three types of data.
2. What is Random sampling ?
3. What do you mean by Descriptive Statistics ?
4. Write the difference between Parameter and Statistic.
5. Write the Statistics and its types.
6. What is Descriptive Statistics ?
7. What is Inferential Statistics ?
8. What do you mean by Mean?
9. What do you mean by Median ?
10.What do you mean by Mode?
11.What do you mean by Variance?
12.What do you mean by Standard Deviation?
13.What do you mean by Range?
14.What is Probability distributions ?
15. What do you mean by Graphical representations?
PART – B
1. Illustrate the steps in constructing a frequency distribution and Construct
a frequency distribution for grouped data. Specify the real limits for the
lowest class interval in this frequency distribution.
3
2. The frequency distribution in the table shows the annual incomes in
dollars for a group of college graduates.
a. Construct a histogram.
b. Construct a frequency polygon.
c. Is this distribution balanced or lopsided?
4
LIST OF IMPORTANT QUESTIONS
UNIT - I
INTRODUCTION
PART – A
5
4. Write the difference between Parameter and Statistic.
In our day in day out, we keep speaking about the Population and sample. So,
it is very important to know the terminology to represent the population and the
sample.A parameter is a number that describes the data from the population. And, a
statistic is a number that describes the data from a sample.
1. Descriptive Statistics
2. Inferential Statistics
6
In simple terms, we interpret the meaning of the descriptive statistics by inferring them
to the population.
For example, we are conducting a survey on the number of two-wheelers in a
city. Assume the city has a total population of 5L people. So, we take a sample of 1000
people as it is impossible to run an analysis on entire population data.
From the survey conducted, it is found that 800 people out of 1000 (800 out of
1000 is 80%) are two-wheelers. So, we can infer these results to the population and
conclude that 4L people out of the 5L population are two-wheelers.
7
15. What do you mean by Graphical representations?
analyze and interpret numerical data.For a single variable (Univariate analysis), we have
a bar plot, line plot, frequency plot, dot plot, boxplot, and the Normal Q-Q plot.We will
16. Students in a theater arts appreciation class rated the classic film “The Wizard of
Oz” on a 10-point scale, ranging from 1 (poor) to 10 (excellent), as follows:
8
17. The IQ scores for a group of 35 high school dropouts are as follows:
9
18. Construct a bar graph for the data shown in the following table
10
20. What is in an interquartile range?
The interquartile range (IQR) contains the second and third quartiles, or the
middle half of our data set. Whereas the range gives us the spread of the whole
data set, the interquartile range gives us the range of the middle half of a data set.
11
25. What is the use of frequency polygon?
A frequency polygon is a type of line graph where a line segment curves to join the
midpoints of all the class intervals. The shape of the curved line helps in providing
accurate data. Both a line graph and frequency polygon graph are widely used when
data is required to be compared.
12
3. Round off to the nearest convenient interval (such as 1, 2, 3, . . 10,
particularly 5 or 10 or multiples of 5 or 10). In the present example, the
nearest convenient interval is 10.
4. Determine where the lowest class should begin.
(Ordinarily, this number should be a multiple of the class interval.) In the
present example, the smallest score is 133, and therefore the lowest class
should begin at 130, since 130 is a multiple of 10 (the class interval).
5. Determine where the lowest class should end
By adding the class interval to the lower boundary and then subtracting
one unit of measurement. In the present example, add 10 to 130 and then
subtract 1, the unit of measurement, to obtain 139—the number at which
the lowest class should end.
6. Working upward, list as many equivalent classes as are required to
include the largest observation.
In the present example, list 130–139, 140–149, . . . , 240–249, so that the last
class includes 245, the largest score.
7. Indicate with a tally the class in which each observation falls.
For example, the first score in Table 1.1, 160, produces a tally next to 160–
169; the next score, 193, produces a tally next to 190–199; and so on.
8. Replace the tally count for each class with a number—the frequency (f
)—and show the total of all frequencies.
(Tally marks are not usually shown in the final frequency distribution.)
9. Supply headings for both columns and a title for the table.
The IQ scores for a group of 35 high school dropouts are as follows:
13
Construct a frequency distribution for grouped data.
(b) Specify the real limits for the lowest class interval in this frequency
distribution.
14
(b) 64.5–69.5
The real limits are located at the midpoint of the gap between adjacent tabled
boundaries; that is, one-half of one unit of measurement below the lower tabled
boundary and one-half of one unit of measurement above the upper tabled
boundary. (65-0.5 = 64.5) & (69+0.5 = 69.5)
2. The frequency distribution in the table shows the annual incomes in dollars
for a group of college graduates.
a. Construct a histogram.
b. Construct a frequency polygon.
c. Is this distribution balanced or lopsided?
15
NOTE: Ordinarily, only either (a) a histogram, or (b) a frequency polygon
would be shown.
When closing the left flank of (b), imagine extending a line to the midpoint of
the first unoccupied class (–10,000 to –1) on the left, but stop the line at the
vertical axis, as shown.
(c) Lopsided.
16
4. Square each total from Step 3.
5. Add up the figures from Step 4.
17
The mean (average) of a data set is found by adding all numbers in the data set and
then dividing by the number of values in the set. The median is the middle value when
a data set is ordered from least to greatest. The mode is the number that occurs most
often in a data set.
The mode is the number that appears most often in the given data set.
Determine the mode for the following retirement ages: 60, 63, 45, 63, 65, 70, 55, 63,
60, 65, 63.
Mode=63
The owner of a new car conducts six gas mileage tests and obtains the following
results, expressed in miles per gallon: 26.3, 28.7, 27.4, 26.6, 27.4, and 26.9
Find the mode for these data.
Mode=27.4
Quartiles segment any distribution that’s ordered from low to high into four
equal parts. The interquartile range (IQR) contains the second and third
quartiles, or the middle half of our data set.
Whereas the range gives us the spread of the whole data set, the interquartile
range gives us the range of the middle half of a data set.
18
Calculate the interquartile range
The interquartile range is found by subtracting the Q1 value from the Q3 value:
Formula Explanation
Q1 is the value below which 25 percent of the distribution lies, while Q3 is the value
below which 75 percent of the distribution lies.
We can think of Q1 as the median of the first half and Q3 as the median of the
second half of the distribution. Methods for finding the interquartile range
Although there’s only one formula, there are various different methods for identifying
the quartiles. We’ll get a different value for the interquartile range depending on the
method we use.
Here, we’ll discuss two of the most commonly used methods. These methods differ
based on how they use the median.
The procedure for finding the median is different depending on whether our data set is
odd- or even-numbered.
• When we have an odd number of data points, the median is the value in the
middle of our data set. We can choose between the inclusive and exclusive
method.
• With an even number of data points, there are two values in the middle, so the
median is their mean. It’s more common to use the exclusive method in this
case.
While there is little consensus on the best method for finding the interquartile range,
the exclusive interquartile range is always larger than the inclusive interquartile range.
The exclusive interquartile range may be more appropriate for large samples, while for
small samples, the inclusive interquartile range may be more representative because
it’s a narrower range.
19
Steps for the exclusive method
To see how the exclusive method works by hand, we’ll use two examples: one with an
even number of data points, and one with an odd number.
Step 2: Locate the median, and then separate the values below it from the
values above it.
With an even-numbered data set, the median is the mean of the two values in the
middle, so we simply divide our data set into two halves.
Q1 is the median of the first half and Q3 is the median of the second half. Since each
of these halves have an odd number of values, there is only one value in the middle of
each half.
20
Step 4: Calculate the interquartile range.
Step 2: Locate the median, and then separate the values below it from the
values above it.
In an odd-numbered data set, the median is the number in the middle of the list. The
median itself is excluded from both halves: one half contains all values below the
median, and the other contains all the values above it.
21
Step 4: Calculate the interquartile range.
Almost all of the steps for the inclusive and exclusive method are identical. The
difference is in how the data set is separated into two halves.
The inclusive method is sometimes preferred for odd-numbered data sets because it
doesn’t ignore the median, a real value in this type of data set.
Step 2: Separate the list into two halves, and include the median in both halves.
The median is included as the highest value in the first half and the lowest value in the
second half.
22
Step 3: Find Q1 and Q3.
Q1 is the median of the first half and Q3 is the median of the second half. Since the
two halves each contain an even number of values, Q1 and Q3 are calculated as the
means of the middle values.
We can see from these examples that using the inclusive method gives us a smaller
IQR. With the same data set, the exclusive IQR is 24, and the inclusive IQR is 20.
23
Normal
Any distribution that approximates the normal shape in panel A of Figure can be
analysed with the aid of the welldocumented normal curve.
Bimodal
Any distribution that approximates the bimodal shape, reflect the coexistence
of two different types of observations in the same distribution.
24
For instance, the distribution of the ages of residents in a
neighbourhood consisting largely of either new parents or their
infants has a bimodal shape.
Positively Skewed
A lopsided distribution caused by a few extreme observations in the positive
direction (to the right of the majority of observations),
Negatively Skewed
25
The distribution of ages at retirement among U.S. job holders has a pronounced
negative skew, with most retirement ages at 60 years or older and relatively few
retirement ages spanning the wide range of ages younger than 60.
Statistics deals with the collection of data and information for a particular
purpose. The tabulation of each run for each ball in cricket gives the statistics of the
game. Tables, graphs, pie-charts, bar graphs, histograms, polygons etc. are used to
represent statistical data pictorially.
To draw frequency polygons, first we need to draw histogram and then follow the below
steps:
• Step 1- Choose the class interval and mark the values on the horizontal axes
• Step 2- Mark the mid value of each interval on the horizontal axes.
26
Example
In a batch of 400 students, the height of students is given in the following table.
Represent it through a frequency polygon.
Solution: Following steps are to be followed to construct a histogram from the given
data:
• The heights are represented on the horizontal axes on a suitable scale as
shown.
• The number of students is represented on the vertical axes on a suitable scale
as shown.
• Now rectangular bars of widths equal to the class- size and the length of the
27