You are on page 1of 57

Skill-Enhancement Elective Course

(SEC) - Data Analysis


Unit 3
Data visualisation: scatter plots, line
graphs, box plots and other graphical
formats
References:
D. Levine, D. Stephan, K. Szabat: Statistics
for Managers using Microsoft Excel, 8th
ed.,Pearson (2017) : Chapter 2, Section 2.3 to
2.5
Graphical Visualization of data
 Categorical Variables: variables with 2 or more categories.
 Presentation of data (single variable):
 To highlight important features of data;
 To see whether data is concentrated in only a few of the
categories;
 To highlight how categories directly compare to each
other (Bar chart);
 To highlight how categories form parts of a whole data
(pie or doughnut chart);
 To present data concentrated in only a few of the
categories (Pareto chart).
 Presentation of data (two variables):
 To highlight direct comparisons (side-by-side chart);
 To highlight how categories form parts of a whole data
(doughnut chart).
2023 Lecture: Ms. Nupur Kataria 2
Graphical Visualization of data

Categorical
variable

Single Two

Doughnut Pareto Side-by-side Doughnut


Bar chart Pie chart
chart chart chart chart

2023 Lecture: Ms. Nupur Kataria 3


Graphical Visualization of data
(Single categorical variable)
The Bar Chart
 series of bars;
 each bar representing the tallies for a single
category;
 length of each bar represents either the frequency or
percentage of values for a category
 Example 1: In a recent survey, responses were
collected that asked people how they paid for
purchases and other transactions. A summary
table is constructed which tallies the set of
individual values as frequencies or percentages for
each category as shown:
2023 Lecture: Ms. Nupur Kataria 4
Graphical Visualization of data

Bar chart

 Respondents are most likely to pay for purchases with cash followed by debit card.
 Very few people paid by check or electronically.

2023 Lecture: Ms. Nupur Kataria 5


Graphical Visualization of data
 Example 2: A sample of 407 retirement
funds were studied for a survey which
included the variable risk that has the defined
categories: Low, Average, and High. A
summary table of the retirement funds,
categorized by risk is shown as:

2023 Lecture: Ms. Nupur Kataria 6


Graphical Visualization of data
Bar chart

 Majority of the funds have average risk


followed by high risk.
 Very few of the funds have low risk.

2023 Lecture: Ms. Nupur Kataria 7


Graphical Visualization of data
The Pie Chart and the Doughnut Chart
 Both use parts of a circle to represent the tallies
of each category of a categorical variable.
 The size of each part, or slice, varies according to the
percentage in each category.
 To represent a category as a slice, multiply the
percentage of that the category by 360 (the number of
degrees in a circle).
 Suppose, for the example 1: how people pay data, the
category cash represents 40% of the sample
respondents. For this category, a slice is144 degrees
(=40 x360).
 The category debit card represents 25% and for this
category, a slice is 90 degrees (25 * 360).
 Pie and doughnut charts differ only in that the latter has
a center hole, much like the in a real doughnut.
2023 Lecture: Ms. Nupur Kataria 8
Graphical Visualization of data
 Example 2: More than 70% of the funds are average risk,
about 17% are high risk, and only about 10% are low risk.

Pie Chart Doughnut Chart

2023 Lecture: Ms. Nupur Kataria 9


Graphical Visualization of data
The Pareto Chart
 Tallies for each category are plotted as vertical bars
in descending order, according to their frequencies,
and are combined with a cumulative percentage line
on the same chart.
 Helps to visually identify the “vital few” categories
from the “trivial many” categories so that you can
focus on the important categories.
 Cumulative line is plotted at the midpoint of each
category, at a height equal to the cumulative
percentage.
 Can be an effective way to visualize data for many
studies that seek causes for an observed
phenomenon.
2023 Lecture: Ms. Nupur Kataria 10
Graphical Visualization of data
 Example 3: A bank study team wants to enhance the user
experience of automated teller machines (ATMs). During
this study, the team identifies incomplete ATM transactions
as a significant issue and decides to collect data about the
causes of such transactions. Causes of incomplete
transactions are collected, stored in ATM Transactions , and
then organized in the summary table as shown:

2023 Lecture: Ms. Nupur Kataria 11


Graphical Visualization of data
 To separate out the “vital few” causes from the “trivial many”
causes, the summary table is organized so that the causes of
incomplete transactions appear in descending order by
frequency, along with percentages and cumulative percentages
as required for constructing a Pareto chart:

2023 Lecture: Ms. Nupur Kataria 12


Graphical Visualization of data

 vertical axis on the left represents the percentage due to each cause;
 vertical axis on the right represents the cumulative percentage.
 which causes contribute the most to the problem of incomplete
transactions?
 warped card jammed (50.44%) and card unreadable (32.3%), account
for 82.7% of the incomplete transactions.
 Thus, attempts to reduce incomplete ATM transactions due to warped
or unreadable cards should produce the greatest payoff.
2023 Lecture: Ms. Nupur Kataria 13
Graphical Visualization of data
Example 1: Method percentage cumulative percentage
Cash 40 40
Debit card 25 65
Credit card 17 82
Check 7 89
Online payment 7 96
Others 4 100
Total 100

 Cash and debit card account for nearly two-thirds of the responses.
 These two categories combined with credit card account for 82% of the responses.
2023 Lecture: Ms. Nupur Kataria 14
Graphical Visualization of data
(Two categorical Variables)
Side-by-Side Chart
 Shows bars that represent the categories of
one variable set grouped by the categories of
the second variable.
Example 2: Retirement funds-
 Two types of funds- growth and value;
 visualizes the data for the levels of risk for
growth and value funds;

2023 Lecture: Ms. Nupur Kataria 15


Graphical Visualization of data
(Two categorical Variables)

 Major part of the growth funds and the


value funds have average risk;
 More of the growth funds have high risk
than low risk, while more of the value funds
have low risk than high risk.
2023 Lecture: Ms. Nupur Kataria 16
Graphical Visualization of data
(Two categorical Variables)
Doughnut Chart
 Also visualizes two variables;
 chart appears as two concentric rings;
 Example 2: Retirement growth and value funds-
*Proportion of funds with average risk
is approximately the same for growth
and value funds,;
*The ratio of high risk to low risk
funds for both greatly differ.

2023 Lecture: Ms. Nupur Kataria 17


H.W
Practice problems given in book for section 2.3

2023 Lecture: Ms. Nupur Kataria 18


Graphical Visualization of data
 Numerical Variables: variables taking numerical values with
no categories;
 Visualize data using techniques that show the distribution
of values.
 Presentation of data (single variable):
 stem-and-leaf display;
 Histogram;
 percentage polygon;
 cumulative percentage polygon (ogive); and
 Boxplot (To be discussed in chapter 3-Unit 4).
 Presentation of data (Two variables):
 Scatter plot;
 Time series plot.
2023 Lecture: Ms. Nupur Kataria 19
Graphical Visualization of data
(Single Variable)
Stem-and-leaf display
 Presents the data as one or more row-wise
stems that represent a range of values.
 Each stem has one or more leaves that branch
out to the right of their stem and represent the
values found in that stem.
 For stems with more than one leaf, the leaves
are arranged in ascending order.
 It allows to see how the data is distributed and
where concentrations of data exist.
 Leaves typically present the last significant digit
of each value, but sometimes you round values.
2023 Lecture: Ms. Nupur Kataria 20
Graphical Visualization of data
 Example 4: Suppose data is:
7.42, 6.29, 5.83, 6.50, 8.34, 9.51, 7.10, 6.80,
5.90, 4.89, 6.50, 5.52, 7.90, 8.30, 9.60.
 Step 1: To arrange data in ascending order.
 7.42, the stem is 7 and its leaf is 4 (round
off).
 For the second value, 6.29, the stem is 6 and
its leaf 3.
 Stem and leaf
display
2023 Lecture: Ms. Nupur Kataria 21
Graphical Visualization of data
 Example 2: Retirement value funds:
 Study the past performance of the value funds. One measure of past
performance is the one-year rate of return percentage.
 stem-and-leaf display of the one-year return percentage for value
funds.
 lowest one-year return was -13.79.
 highest one-year return was 17.75.
 one-year returns were concentrated
between 8 and 12.
 very few of the one-year returns
were above 14.
 more with very low values than very high
values.

2023 Lecture: Ms. Nupur Kataria 22


Graphical Visualization of data
Histogram
 Visualizes data as a vertical bar chart;
 Each bar represents a class interval from a
frequency or percentage distribution.
 Display the numerical variable along the
horizontal (X) axis and use the vertical (Y)
axis to represent either the frequency, relative
frequency (frequency of the class
interval/total frequency) or percentage of
values per class interval.;
 No gaps between adjacent bars in a
histogram.
2023 Lecture: Ms. Nupur Kataria 23
Graphical Visualization of data
 Example 5: Data on Cost of a Meal at 50
Center City Restaurants and 50 Metro Area
Restaurants.

2023 Lecture: Ms. Nupur Kataria 24


Graphical Visualization of data
 Frequency histograms:

 The histogram for center city restaurants shows that the cost
of meals is concentrated between approximately $40 and
$70.
 Eleven meals at center city restaurants cost $70 or more.
 The histogram for metro area restaurants shows that the
cost of meals is concentrated between $20 and $60.
 Very few meals at metro area restaurants cost more than
$60. 2023 Lecture: Ms. Nupur Kataria 25
Graphical Visualization of data
Example 2: Retirement growth and value funds:
 Compare the past performance of the growth funds and the value funds, using
the one year return percentage variable using frequency histograms.
 Class intervals: -15 to -10 (less than), -10 to -5, -5 to 0, 0 to 5,…..,15 to 20, 20 to
25.

 returns were lower for the growth funds than for value funds.
 The return for both the growth funds and the value funds is concentrated
between 0 and 15, but
 the return for the value funds is more concentrated between 5 and 15 while
 the return for the growth funds is more concentrated between 0 and 15.

2023 Lecture: Ms. Nupur Kataria 26


Graphical Visualization of data
Percentage Polygon
 Divide the data of a numerical variable into
two or more groups (Classes);
 uses the midpoints of each class interval to
represent the data of each class and
 then plots their respective class percentages
at mid-points as points on a line along the X
axis.
 allows you to make a direct comparison that
is easier to interpret.
2023 Lecture: Ms. Nupur Kataria 27
Graphical Visualization of data
 Example 5: cost of meals at center city and metro
area restaurants.
Meal Cost Center City Metro Area
($) Frequency Percentage Frequency Percentage
10-less
than 20 0 0 1 2
20-30 6 12 14 28
30-40 5 10 12 24
40-50 9 18 10 20
50-60 9 18 10 20
60-70 10 20 3 6
70-80 6 12 0 0
80-90 2 4 0 0
90-100 3 6 0 0
Total 50 100 50 100

2023 Lecture: Ms. Nupur Kataria 28


Graphical Visualization of data
 Example 5: cost of meals at center city and metro area restaurants.

 Again note that the center city (blue line) meal cost is concentrated between $40 and
$70 while the metro area (yellow line) meal cost is concentrated between $20 and $60.
 However, unlike the pair of histograms, the polygons allow you to more easily identify
which class intervals have similar percentages for the two groups and which do not.
 At X = $35 (30-40 class interval), for meal costs at center city restaurants (the lower
one) show that 10% of the meals cost between $30 and $40, while the meal costs at
metro area restaurant (the higher one) shows that 24% of meals at these restaurants
cost between $30 and $40.
2023 Lecture: Ms. Nupur Kataria 29
Graphical Visualization of data
 Example 2: Retirement growth and value funds:
 To compare the past performance of the growth funds and the value
funds using the one year return percentage variable, construct
percentage polygons for the growth and value funds-

One-Year Return Growth-fund Value-fund


Percentage Frequency Percentage Frequency Percentage
-15 but less than -10 3 1.12 1 0.72
-10 but less than -5 2 0.74 3 2.17
-5 but less than 0 20 7.43 5 3.62
0 but less than 5 55 20.45 21 15.22
5 but less than 10 96 35.69 49 35.51
10 but less than 15 76 28.25 55 39.86
15 but less than 20 16 5.95 4 2.90
20 but less than 25 1 0.37 0 0.00
Total 269 100.00 138 100.00

30
2023 Lecture: Ms. Nupur Kataria
Graphical Visualization of data
 Example 2: Retirement growth and value funds:

 Value funds polygon is to the right of the growth funds polygon.


 Thus, one-year return percentage is higher for value funds than for
growth funds.
 Also, the return for value funds is concentrated between 5 and 15, and
the return for the growth funds is concentrated between 0 and 15.
2023 Lecture: Ms. Nupur Kataria 31
Graphical Visualization of data
Cumulative Percentage Polygon (Ogive)
 Uses the cumulative percentage distribution
to plot the cumulative percentages along the
Y axis.
 Unlike the percentage polygon (plots mid-
points), the lower boundary of the class
interval for the numerical variable are
plotted, at their respective class
percentages, as points on a line along the X
axis.

2023 Lecture: Ms. Nupur Kataria 32


Graphical Visualization of data
Example 5: cost of meals at center city
and metro area restaurants.

Meal Cost Center City Cumulative Metro Area Cumulative


($) Frequency Percentage Percentage Frequency Percentage Percentage
10-less than
20 0 0 0 1 2 2

20-30 6 12 12 14 28 30

30-40 5 10 22 12 24 54

40-50 9 18 40 10 20 74

50-60 9 18 58 10 20 94

60-70 10 20 78 3 6 100

70-80 6 12 90 0 0 100

80-90 2 4 94 0 0 100

90-100 3 6 100 0 0 100

Total 50 100 50 100


2023 Lecture: Ms. Nupur Kataria 33
Graphical Visualization of data
Cumulative percentage polygons of meal costs for center city and metro area
restaurants.
 In this chart, the lower boundaries of the class intervals (10, 20, 30, 40, etc.) are
approximated by the upper boundaries of the previous bins (9.99, 19.99, 29.99, 39.99,
etc.).
 Curve of the cost of meals at the center city restaurants is located to the right of the
curve for the metro area restaurants.
 indicates that the center city restaurants have fewer meals that cost less than a particular
value.
 For example, 40% of the meals at center city restaurants cost less than $50, as compared
to 74% of the meals at metro area restaurants.

2023 Lecture: Ms. Nupur Kataria 34


Graphical Visualization of data
Example 2: Retirement growth and value funds:

One-Year Return Growth Cumulative Value Cumulative


Percentage Frequency Percentage Percentage Frequency Percentage Percentage
-15 but less than -10 3 1.12 1.12 1 0.72 0.72
-10 but less than -5 2 0.74 1.86 3 2.17 2.90
-5 but less than 0 20 7.43 9.29 5 3.62 6.52
0 but less than 5 55 20.45 29.74 21 15.22 21.74
5 but less than 10 96 35.69 65.43 49 35.51 57.25
10 but less than 15 76 28.25 93.68 55 39.86 97.10
15 but less than 20 16 5.95 99.63 4 2.90 100.00
20 but less than 25 1 0.37 100.00 0 0.00 100.00
Total 269 100.00 138 100.00

2023 Lecture: Ms. Nupur Kataria 35


Graphical Visualization of data
The cumulative percentage polygons curve for the one-year return
percentage
 Curve for the growth funds is located slightly to the left of the curve for the
value funds.
 Indicates that the growth funds have lesser one-year return percentages.
 For example, 65.43% of the growth funds had one-year return percentages
below 10, as compared to 57.25% of the value funds.
 In general, the value funds slightly outperformed the growth funds in their one
year returns.

2023 Lecture: Ms. Nupur Kataria 36


Graphical Visualization of data
(Two-variables)
 Visualizing two numerical variables together
can reveal possible relationships between
two variables.
 The Scatter Plot
 Explores the possible relationship between
two numerical variables;
 Plot the values of one numerical variable
on the horizontal, or X, axis and the values of
a second numerical variable on the vertical,
or Y, axis.
2023 Lecture: Ms. Nupur Kataria 37
Graphical Visualization of data
Example 6: Suppose that you are an investment analyst who has been asked
to review the valuations of the 30 NBA professional basketball teams.You
seek to know if the value of a team reflects its revenues. You collect
revenue and valuation data (both in $millions) for all 30 NBA teams,
organize the data as follows:

2023 Lecture: Ms. Nupur Kataria 38


Graphical Visualization of data
Scatter plot of revenue and value for NBA teams

 There appears to be a strong increasing (positive)


relationship between revenues and the value of a
team.
 In other words, teams that generate a smaller amount
of revenues have a lower value, while teams that
generate higher revenues have a higher value.
2023 Lecture: Ms. Nupur Kataria 39
Graphical Visualization of data
 Similarly, other pairs of variables may have a
decreasing (negative) relationship in which one
variable decreases as the other increases.
 In other situations, there may be a weak or no
relationship between the variables.
 The Time-Series Plot
 Plots the values of a numerical variable on the
Y axis and plots the time period associated with
each numerical value on the X axis.
 Helps to visualize trends in data that occur
over time.
2023 Lecture: Ms. Nupur Kataria 40
Graphical Visualization of data
 Example 7: As an investment analyst who specializes in the
entertainment industry, you are interested in discovering any long-term
trends in movie revenues. You collect the annual revenues (in $billions)
for movies released from 1995 to 2014, organize the data as:

 Time – series plot →

2023 Lecture: Ms. Nupur Kataria 41


H.W
Practice problems given in book for
sections 2.4 and 2.5

2023 Lecture: Ms. Nupur Kataria 42


Q. 2.90
A survey was completed by senior-level marketers on marketer
expectations and perspectives going into the next year for such things
as marketing spending levels, media usage, and new business activities.
Marketers were asked about how they most often find out about new
marketing agencies for hire and the value they are placing on marketing
agencies that specialize in their industry. The results are presented in
the following tables:

a. Construct a bar chart, a pie chart, a doughnut chart, and a Pareto chart.
b. Which graphical method do you think is best for portraying these data?

2023 Lecture: Ms. Nupur Kataria 43


a.
Most often Way to Find out About
Most often Way to Find out About
New Marketing Agencies
New Marketing Agencies
Agency search consultants 100 100
80 80
Referrals from friends, colleagues 60 60
40 40 Percentage
Searching on Google, Bing 20 20 (%)
Percentage (%) 0 0
Social outreach
Cumulative
percentage
Calls/emails from agencies
(%)
0 20 40 60

Most often Way to Find out About New Most often Way to Find out About New
Marketing Agencies Marketing Agencies
6%
7% Referrals from friends,
Referrals from friends, colleagues
6%
colleagues 7%
7% Calls/emails from agencies
Calls/emails from agencies 7%

48% Searching on Google, Bing


48% Searching on Google, Bing

Agency search consultants


Agency search consultants 32%
Social outreach
Social outreach
32%

2023 Lecture: Ms. Nupur Kataria 44


b) The pie chart may be best since with only five categories, it
enables you to see the portion of the whole in each
category.

c) Construct a bar chart, a pie chart, a doughnut chart, and a


Pareto chart.
d) Which graphical method do you think is best for portraying
these data?
e) What conclusions can you reach concerning marketers’
perspective on new marketing agencies?
2023 Lecture: Ms. Nupur Kataria 45
Importance of Marketing Agency Importance of Marketing Agency
Specializing in Marketer’s Specializing in Marketer’s
Industry Industry
12%
Not at all important

43% VERY IMPORTANT


somewhat important
Percentage(%) somewhat important
Not at all important
VERY IMPORTANT
45%

0 10 20 30 40 50

Importance of Marketing Agency


Specializing in Marketer’s Importance of Marketing Agency
Industry Specializing in Marketer’s
Industry
12% 100 100
80 80
43% VERY IMPORTANT 60 60 Percentage(%)
somewhat important
40 40
45% Not at all important cumulative percentage
20 20
(%)
0 0
somewhat VERY Not at all
important IMPORTANT important

2023 Lecture: Ms. Nupur Kataria 46


(d) The pie chart may be best since, with only
three categories it enables you to see the
portion of the whole in each category.
(e) Marketers mostly find out about new
marketing agencies from calls/emails from
agencies and referrals from friends and
colleagues.
 Almost 90% believe that it is important for a
marketing agency to specialize in the
marketer’s industry.

2023 Lecture: Ms. Nupur Kataria 47


2.32: Do social recommendations increase ad effectiveness? A
study of online video viewers compared viewers who arrived
at an advertising video for a particular brand by following a
social media recommendation link to viewers who arrived at
the same video by web browsing. Data were collected on
whether the viewer could correctly recall the brand being
advertised after seeing the video. The results were as follows:

a. Construct a side-by-side bar chart and a doughtnut chart of the


arrival method and whether the brand was promptly recalled.
b. What do these results tell you about the arrival method and brand
recall?

2023 Lecture: Ms. Nupur Kataria 48


Arrival Method Correctly Recalled the Brand For Doughnut chart
percentage
yes no Total (yes) percentage (no)

Recommendation 407 150 557 407/557=73% 150/557=27%


Browsing 193 91 284 193/284=68% 91/284=32%
Total 600 241 841

Side-by-side chart of Arrival Method and whether


the brand was promptly recalled.
Doughnut chart of Arrival Method and
whether the brand was promptly recalled.
Browsing
(Browsing on outer ring)

no
32%27%
yes
Recommendation yes
no
73%68%
0 50 100 150 200 250 300 350 400 450
Frequency

b) Social recommendations had a more impact on correct recall.

2023 Lecture: Ms. Nupur Kataria 49


Q 2.96: The data file contains the percentage alcohol, number of calories per 12
ounces, and number of carbohydrates (in grams) per 12 ounces for 171 of the
best selling medicines in the United States.
a. Construct a percentage histogram, percentage polygon and cumulative
percentage polygon for percentage alcohol and number of calories per 12
ounces.
b. Construct scatter plot: percentage alcohol versus calories.
c. Discuss what you learned from studying the graphs in (a) and (b).
Ans. a) Percentage histogram: Alcohol

Mid- Cumulative
CI points Frequency Percentage (%) Percentage Histogram (Alcohol)
0.02-0.03 0.025 3 1.75% 1.75% 60.00%
50.00%

Percentage
0.03-0.04 0.035 4 2.34% 4.09% 40.00%
30.00%
0.04-0.05 0.045 95 55.56% 59.65% 20.00%
10.00%
0.05-0.06 0.055 44 25.73% 85.38%
0.00%
0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095
0.06-0.07 0.065 12 7.02% 92.40%
mid-points
0.07-0.08 0.075 6 3.51% 95.91%

0.08-0.09 0.085 4 2.34% 98.25%


0.09-0.1 0.095 3 1.75% 100.00%
Total 171 100.00% 2023 Lecture: Ms. Nupur Kataria 50
Percentage histogram: number of calories per 12 ounces.
Mid- Cumulative
CI points Frequency Percenatge (%) Percentage
50-100 75 12 7% 7% 50% Histogram(Calories)
100-150 125 71 42% 49% 40%

Percentage
150-200 175 68 40% 88% 30%

200-250 225 15 9% 97% 20%

250-300 275 2 1% 98% 10%

300-350 325 3 2% 100% 0%


75 125 175 225 275 325
Total 171 100% Mid-points

Percentage Polygon:
Percentage Polygon (Alcohol) Percentage Polygon (Calories)
60% 55.56% 45%
40% 42%
40%
50% 35%
Percentage 30%
40% 25%
Percentage

20%
30% 25.73% 15%
10% 9%
7%
20% 5%
0% 1% 2%
0% 0%
10% 7.02% 25 75 125 175 225 275 325 375
3.51%2.34%
1.75%2.34% 1.75% Calories
0% 0%
0%
0.015 0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095 0.105 2023 Lecture: Ms. Nupur Kataria 51
Alcohol(%)
Cumulative Percentage Polygon:
Cumulative Percentage Polygon
(Alcohol)
120.00%
Cumulative Percent
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
0.0199 0.0299 0.0399 0.0499 0.0599 0.0699 0.0799 0.0899
Alcohol (%)

Cumulative Percentage Polygon


(Calories)
120%
Cumulative Percent

100%
80%
60%
40%
20%
0%
49.99 99.99 149.99 199.99 249.99 299.99
Calories

2023 Lecture: Ms. Nupur Kataria 52


b) Scatter plot: percentage alcohol versus calories.
Scatter Plot
350
300
250
200
Calories
150
100
50
0
0.0000 0.0200 0.0400 0.0600 0.0800 0.1000 0.1200
-50
Alcohol(%)
c)
 The alcohol percentage is concentrated between 4% and 6%, with more
between 4% and 5%.
 The calories are concentrated between 100 and 150.
 There are outliers in the percentage of alcohol in both tails.
 There are a few medicines with calorie content as high as around 327.5.
 There is a strong positive relationship between percentage of alcohol and
calories.

2023 Lecture: Ms. Nupur Kataria 53


Q :Scott measured the height of all the people in his group. The values, in
cm, are given below:
154,180,176,153, 162,165,154,186,190,187,176,176,172,182,177,169
a. Draw a stem-and-leaf diagram to represent Scott’s data.
b. What can you observe from this display?
Ans. a. Step 1: The stem will be the first two digits, while the leaf will be
the last digit.
Step 2: Rewrite the data in ascending order:
153,154,154,162, 165,169,172,176,176,176,177,180,182,186,187,190.
Step 3: first two digits become stem and last digit is the leaf.
Stem Leaf
153 4 4
162 5 9
172 6 6 7
180 2 6 7
190
b. Height is concentrated between 172 to 187 cm (inclusive).
Only one person with height 190cm.
Min. value = 153cm
Max. value = 190cm
2023 Lecture: Ms. Nupur Kataria 54
 Q 2.104: The following data represent the amount of soft drink in a
sample of 50 consecutively filled 2-liter bottles. The results are listed
horizontally in the order of being filled:
2.109 2.086 2.066 2.075 2.065 2.057 2.052 2.044 2.036 2.038 2.031
2.029 2.025 2.029 2.023 2.020 2.015 2.014 2.013 2.0142.012 2.012
2.012 2.010 2.005 2.003 1.999 1.996 1.997 1.992 1.994 1.986 1.984
1.981 1.973 1.975 1.971 1.969 1.966 1.967 1.963 1.957 1.951 1.951
1.947 1.941 1.941 1.938 1.908 1.894
a. Construct a time-series plot for the amount of soft drink on the
Y axis and the bottle number (going consecutively from 1 to 50) on the
X axis.
b. What pattern, if any, is present in these data?
c. If you had to make a prediction about the amount of soft drink
filled in the next bottle, what would you predict?
d. Based on the results of (a) through (c), explain why it is important
to construct a time-series plot and not just a histogram.

2023 Lecture: Ms. Nupur Kataria 55


Ans. a)
Amount of soft drink Histogram
2.150 20
2.100

Frequency
15
2.050
10
Amount

2.000
1.950 5
1.900
0
1.850
1.875 1.925 1.975 2.025 2.075
1.800
1.750 Amount of soft drink
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Bottle Number of soft drink

b) There is a downward trend in the amount filled.


c) The amount filled in the next bottle will most likely be
below 1.894 liters.
d) The time series plot of the amount of soft drink filled
reveals the trend of the data, whereas a histogram only
provides information on the distribution of the data.

2023 Lecture: Ms. Nupur Kataria 56


Next :
1. Excel practical.
2. Use excel to solve questions discussed in class.

H.W
Practice back questions of the chapter for
relevant sections

2023 Lecture: Ms. Nupur Kataria 57

You might also like