Introduction to Statistics Overview
Introduction to Statistics Overview
1
Introduction to Statistics & Graphs
Chapter Outline
• Introduction:
Definition of statistics.
• Descriptive and Inferential Statistics:
Differentiate between the two branches of statistics,
Statistical terms.
• Variables and Types of Data:
Identify types of data.
• Importance of Statistics in different fields
4
Example # 1:
Identifying Data Sets
In a recent survey, 1500 adults in the United States were asked if they thought there was
solid evidence of global warming. Eight hundred fifty-five of the adults said yes. Identify
the population and the sample. Describe the sample data set. (Adapted from Pew Research Center)
Solution
The population consists of the responses of all adults in the United States, and the sample
consists of the responses of the 1500 adults in the United States in the survey. The
sample is a subset of the responses of all adults in the United States. The sample data set
consists of 855 yes’s and 645 no’s.
5
Example # 2:
Distinguishing Between a Parameter and a Statistic.
Decide whether the numerical value describes a population parameter
or a sample statistic. Explain your reasoning.
1. A recent survey of 200 college career centers reported that the
average starting salary for petroleum engineering majors is $83,121.
2. The 2182 students who accepted admission offers to Northwestern
University in 2009 have an average SAT score of 1442.
3. In a random check of a sample of retail stores, the Food and Drug
Administration found that 34% of the stores were not storing fish at
the proper temperature.
Solution
1. Because the average of $83,121 is based on a subset of the
population, it is a sample statistic.
2. Because the SAT score of 1442 is based on all the students who
accepted admission offers in 2009, it is a population parameter.
3. Because the percent of 34% is based on a subset of the population, it
is a sample statistic.
6
PRACTICE QUESTIONS
Identify the population and the sample. Identify a parameter and a statistic.
• In a recent survey, 3002 American • A recent survey of a sample of
adults were asked if they read news on MPhil’s reported that the average
the Internet at least once a week. Six starting salary for an MPhil is less
hundred of the adults said yes. than $65,000.
• The annual salary for each • Starting salaries for the 667 MS
employee at a company. graduates from the University of
Chicago School of Business
increased 8.5% from the previous
• The speed of every fifth car year.
passing a police speed trap.
• In 2007, the interest category for
• A survey of 1420 U.S. undergraduate 12% of all new magazines was
English majors asked which sports
Shakespearean play was most
relevant in the year 2004. • The average annual salary for 35
of a company’s 1200 accountants
• A survey of 500 students from a is $57,000.
university with 2000 students.
7
Example # 3: A survey conducted among 1017 men and women by
Opinion Research Corporation found that 76% of women and 60% of
men had a physical examination within the previous year. (Source: Men’s Health)
(a) Identify the descriptive aspect of the survey.
(b) What inferences could be drawn from this survey.
8
PRACTICE QUESTION
9
10
PRACTICE
( Basic Skills & Concepts )
• •
How is a sample related to a population?
A sample is a subset of a population.
• Why is a sample used more often than a population?
• It is usually impractical (too expensive & time consuming) to obtain all the population data.
• What is probability? Name two areas where probability is used.
• Probability deals with events that occur by chance. It is used in insurance and gambling.
• Give three reasons why samples are used in statistics.
• 1. Saves time 2. Saves money 3. Use when population is infinite.
PRACTICE
True or False
1. A statistic is a measure that describes a population characteristic.
1. False
11
Question # 1
One airline claims that less than 1% of its scheduled
flights out of Orlando International Airport depart late.
From a random sample of 200 flights, 1.5% were
found to depart later than the scheduled time.
(i) What is the population?
(ii) What is the sample?
(iii) What is the statistic?
(iv) Is 1.5% a parameter or a statistic?
Question # 2
Your university surveyed its students to determine
average weekly time spent surfing the Internet. From a
random sample of 174 students the average time was
computed to be 6.1 hours.
(i) What is the population?
(ii) What is the sample?
(iii) What is the statistic?
(iv) Is the value 6.1 hours a parameter or a statistic?
12
Question # 3
Determine if descriptive or inferential statistics should be used
to obtain the following information.
a. A graph that shows the number of defective bottles
produced during the day shift over one week’s time.
• Descriptive -- To describe information about a one-week sample.
b. An estimate of the percentage of employees who arrive to
work late.
• Inferential -- To estimate the true percentage of all employees who arrive to work late.
c. An indication of the relationship between years of
employee experience and pay scale.
• Inferential – To predict the relationship between years of experience and pay scale.
13
Constant.
A quantity which can assume only one value is called constant for example e = 2.71828, p = 3.14159.
Variable.
A measurable quantity which changes from one individual to another is called variable.
Discrete Variable.
A variable which can take some specific values within a given range is called discrete variable.
Continuous Variable.
A variable which can take any value within a given range is called continuous variable.
Attribute.
A characteristic which cannot be measured numerically but only its presence or absence can be described is
called an attribute.
Frequency Distribution.
A frequency distribution is a table in which the values of a variable are grouped into classes and observed
frequencies are recorded.
Observation.
Any numerical analysis or reading found after a research.
Data.
A single observation is known as datum and more than one observation is called data.
Ungrouped Data/Raw Data.
Data which have not been condensed in the form of frequency distribution are called ungrouped data or raw
data.
Grouped Data.
The data which have been condensed in the form of frequency distribution are called grouped data.
Primary Data.
Data obtained from the original source and by direct observation is called primary data.
Secondary Data.
It is a sequence of observations which have undergone any sort of statistical treatment at least once is called
secondary data.
Permutations.
An arrangement of ‘r’ objects taken from ‘n’ distinct objects in particular order is called permutation and denoted
by n P
r 14
Combinations.
An arrangement of ‘r’ objects taken from ‘n’ distinct objects without regarding any order is called
combination and denoted by n C
r
Presentation of Data.
(i) Classification of data (ii) Tabulation (iii) Graphical Display
The process of arranging observation into homogenous groups is known as classification. A table is a systematic
arrangement of data into vertical and horizontal rows and the process of arranging data into rows and columns is
called tabulation.
Aims of Classification. The main aims of classification are:
(i) To convert the large sets of data to as easily understood summary.
(ii) To provide the ground for comparison an inference.
(iii) To delete the unimportant details.
(iv) To show the points of similarity and dissimilarity, and
(v) To reflect the important aspects of the data.
Diagrammatic and Graphic Representation.
Diagrammatic and graphic representation is the pictorial representation of the numerical data. Diagrams and
graph give effective and long-lasting impresses more than simple figures do. They help the reader in
understanding the shape of the distribution of the data.
Graph means the drawing of geometrical curves in conformity with the given data. It is a representation of data by
a continuous curve. Diagram means the translation of statistical figure into geometrical figure. It is a one-, two- or
three-dimensional form of visual representation.
Important diagrams are bar diagrams, rectangles and pie diagram. Similarly, important graphs of frequency
distribution are histogram, Pareto Chart, frequency polygon, frequency curve, ogive or cumulative frequency
polygon and historigram.
Model.
A model is a relationship between variables, which are intended to represent a real-life process, situation or
problem.
Decimal Places.
The figures to the right of the decimal point that are required to express the magnitude of a number to a specified
degree of accuracy. For example, 3.1415927 written to four decimal places is 3.1416; i.e., there are four
digits after the decimal point, namely 1, 4, 1 and 6 (where the final digit represents a rounding-up or –down 15
of any subsequent digits).
Methods Collecting Primary Data:
The primary data may be collected by the following methods or sources:.
(i) Direct Personal Investigation: Through personal investigation, an investigator collects the information directly from the
individual concerned. Information collected through this method is quite accurate and complete, but this method is
expensive and slow.
(ii) Indirect Personal Investigation: Sometimes individuals hesitate to give correct information for some reasons. In such
situation third parties or witness having information are interviewed. This method is useful when the information desired
is difficult.
(iii) Collection Through Questionnaires: In this method questionnaire is sent to the individuals by post with the letter of
request to supply the required information. This method is cheap but not successful in areas where people are not literate
and cooperative. Sometime filling questionnaires are incomplete, even then it is successfully used in large scale
government inquiries.
(iv) Collection Through Enumerators: In this method information is collected through enumerators. Enumerators go to the
individuals and assist them in filing the questionnaire correctly. This method reduced the difficulties of illiterate people.
This is very costly inquiry and can be carried out by the government only.
(v) Registration: In this method the information is reported for registration to an appropriate authority shortly after the event
occurs. This method is used for deaths, births, marriages, divorces, sickness etc.
(vi) Collection through Local Sources: Through this method local agents send the required information using their own
judgments. This technique is not reliable, generally used in crop estimation.
(vii) Computer Interviews: Individuals enter data directly into a computer in response to questions presented on the monitor.
Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21
• Select a starting point for the lowest class limits. This can be
smallest data value. Add the width to the lowest data value to
get the lower limit of the next class. Keep adding in order to get
the desired no of classes. Subtract one unit from the lower limit
of the second class to get the upper limit of the first class. Then
add the width to each upper limit to get all the upper limits.
18
• Tally the data
• Find the numerical frequencies from the tallies
• Find the class boundaries
• Find the cumulative frequencies
• Find the class mark (X)
• Find the relative frequency
Example # 5:
(a) The following data represent the record high temperatures for each of the 50 states. Construct
frequency distribution table with 7 classes.
(b) Find the Class marks, relative frequencies, class boundaries & cumulative frequencies.
112 100 127 120 134 118 105 110 109 112 110 118 117 116
118 122 114 114 105 109 107 112 114 115 118 117 118 122
106 110 116 108 110 121 113 120 119 111 104 111 120 113
120 117 105 110 118 112 114 114
19
Solution
H = 134, L = 100, R = H – L = 134 – 100 = 34
No of classes = 7
h = R / (no of classes) = 34 / 7 = 4.9 ≈ 5 (rounded up).
Select the starting point for the lowest class limit. In this case 100 is used. Add
the width (h) to the starting point, keep adding until there are 7 classes.
Frequency Distribution Table
Relative Class
Classes Tally f X Frequency Boundaries C
100 - 104 ∕∕
2 102 0.04 99.5 – 104.5 2
105 - 109 ∕∕∕∕ ∕∕∕ 8 107 0.16 104.5 – 109.5 10
110 - 114 ∕∕∕∕ ∕∕∕∕ ∕∕∕∕ ∕∕∕ 18 112 0.36 109.5 – 114.5 28
115 - 119 ∕∕∕∕ ∕∕∕∕ ∕∕∕ 13 117 0.26 114.5 – 119.5 41
120 - 124 ∕∕∕∕ ∕∕ 7 122 0.14 119.5 – 124.5 48
125 - 129 ∕ 1 127 0.02 124.5 – 129.5 49
130 - 134 ∕ 1 132 0.02 129.5 – 134.5 50
Totals 50 - 1.00 - -
20
Example # 6: From the following data construct frequency distribution,
using 6 class. Indicate the class boundaries and cumulative frequencies.
54.6 59.1 70.5 68.1
68.5 60.4 64.0 59.2
60.2 62.1 59.1 59.2
67.0 57.1 55.1 55.9
48.5 59.3 64.0 63.0
Solution
H = 70.5, L = 48.5, R = H – L = 70.5 – 48.5 = 22
No of classes = 6
𝑅 22
ℎ= = = 3.7 ≅ 𝟒
𝑛𝑜 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 6
Selecting the starting value as a lowest limit = 48.5, keep add 4
until there are 6 classes.
Subtract one unit from the lower limit of the second class to set the
upper-class limit of first class = 52.4, then add h to each upper limit
to get all the upper limits. Tally the data. Find the numerical
frequencies from the tallies.
21
Class
Class limits Tally f C
Boundaries
48.5 – 52.4 I 1 48.45 – 52.45 1
52.5 – 56.4 III 3 52.45 – 56.45 4
56.5 – 60.4 IIII III 8 56.45 – 60.45 12
60.5 – 64.4 IIII 4 60.45 – 64.45 16
64.5 – 68.4 II 2 64.45 – 68.45 18
68.5 – 72.4 II 2 68.45 – 72.45 20
Total – 20 – –
Question # 4
From the following data construct frequency distribution table with six classes.
Also find the Class marks, relative frequencies, class boundaries & cumulative
frequencies. 13, 6, 9, 3, 19, 8, 10, 11, 12, 8, 14, 16, 17, 10, 2, 5, 9, 7, 6, 10.
22
Exercises
Question # 5.
Find the class boundaries, class marks and class widths for the
following intervals.
(i) 7 – 13 (ii) 10.4 – 18.7 (iii) (-5) – (-1)
(iv) (-2.75) – 1.35 (v) 0.346 – 0.418 (vi) 78.49 – 86.72
Question # 6.
In a music competition, students are asked to rate the music on
five points scale A, B, C, D, E, where A represents the maximum
enjoyment and E represents minimum enjoyment. The ratings are
A, D, A, D, E, B, C, D, A, B, B, C, E, A, C, E, C, A, B, E, D, E, B,
A, B, E, E, C, B, A. Construct a frequency distribution for the
above rating.
Question # 7.
From the following data construct frequency distribution table with
five classes. Also find the Class marks, relative frequencies, class
boundaries & cumulative frequencies.
17, 14, 15, 14, 13, 10, 14, 7, 8, 10, 6, 25, 18, 21.
Answer: Here range is 25 – 6 = 19 & no of classes = 5, so h = 19/5 = 3.8 ≈ 4. Classes 6 – 9, 10 – 13, 14 – 17, 18 – 21, 22 – 25. Frequency f 3, 3, 5, 2, 1 23
= 14, X = 7.5, 11.5, 15.5, 17.5, 23.5. R.f = 3/14, 3/14, 5/14, 2/14, 1/14. C.B = 5.5 – 9.5, 9.5 – 13.5, 13.5 – 17.5, 17.5- 21.5, 21.5- 25.5.
1. Histogram. Histogram is a bar graph of a frequency
distribution.
2. Frequency A frequency polygon is a line graph of a
Polygon. frequency distribution in which the frequencies
are plotted against the midpoints of the classes.
3. Ogive. A graph showing the cumulative frequencies
against the upper-class boundaries is called
Ogive or cumulative frequency polygon.
4. Pie Chart. A pie chart is a circle that is divided into sectors
according to the percentage of frequencies in
each category of the distribution.
5. Time Series Time series graph represents data over a specific
Graph. period of time.
6. Pareto This chart is used to represent a frequency
Chart. Distribution for a categorical variable, & the
frequencies are displayed by the heights of
vertical bars, which are arranged in order from
24
highest to lowest.
Example # 7: Using example # 5 construct Histogram, frequency polygon,
& ogive.
18
Histogram
15
12
Frequency
0
x
99.5 104.5 109.5 114.5 119.5 124.5 129.5 134.5
Temperature
25
y
18
Frequency Polygon
F 15
r
e 12
q
u 9
e
n 6
c
y 3
0
x
102 107 112 117 122 127 132
Temperature
26
y Ogive
50
40
Cumulative Frequency
30
20
10
0
x
99.5 104.5 109.5 114.5 119.5 124.5 129.5 134.5
Temperature
27
Example # 8: Draw an ogive for the following frequency distribution.
Estimate how many GPS navigators cost $300 or less. Also, use the graph
to estimate when the greatest increase in price occurs.
Classes Class Boundaries f
59 – 114 58.5 – 114.5 5
115 – 170 114.5 – 170.5 8
171 – 226 170.5 – 226.5 6
227 – 282 226.5 – 282.5 5
283 – 338 282.5 – 338.5 2
339 – 394 338.5 – 394.5 1
395 – 450 394.5 – 450.5 3
Solution
Using the cumulative frequencies, you can construct the ogive shown. The
upper-class boundaries, frequencies, and cumulative frequencies are shown
in the table. Notice that the graph starts at 58.5, where the cumulative
frequency is 0, and the graph ends at 450.5, where the cumulative frequency
Upper Cumulative
is 30. Class Boundaries
f
Frequency
58.5 – 114.5 5 5
170.5 8 13
226.5 6 19
282.5 5 24
338.5 2 26
394.5 1 27
450.5 3 30
Interpretation: From the ogive, you can see that
about 25 GPS navigators cost $300 or less. It is
evident that the greatest increase occurs between
$114.50 and $170.50, because the line segment 28
is steepest between these two class boundaries.
Question # 8.
Using the following histogram.
(i) Construct a frequency distributions include Class limits, class
frequencies, midpoints and cumulative frequencies.
(ii) How many values are in the class 27.5 – 30.5?
(iii) How many values fall between 24.5 and 36.5?
(iv) How many values are below 33.5?
(v) How many values are above 30.5?
29
Example # 9: The following table lists the number of cellular telephone
subscribers in millions. Construct a time series graph for the number of cellular
subscribers. What can you conclude? Year Subscribers
(in millions)
2001 5.3
2002 7.6
2003 11.0
2004 16.0
2005 24.1
2006 33.8
45 2007 44.0
35
25
15
0
2001 2002 2003 2004 2005 2006 2007
From the graph we can see that the number of subscribers are increasing since 2001. Recent years 30
show greater increases.
Example # 10: Construct a Pareto chart & pie chart for the total investment of the various types of categories
during the year 2005.
Total - - - -
Solution
31
Pie Table
Investment Investor A Investor B Investor C Total Percentages Angle of Sectors Cumulative
Category Investment (Degrees) Angles.
Bonds 32 44 19 95 95 / 324 = 0.29 0.29 x 360 = 104.4 104.4
Stocks 46.5 55 27.5 129 129 / 324 = 0.40 0.40 x 360 = 144 248.4
Savings 16 28 7 51 51 / 324 = 0.16 0.16 x 360 = 57.6 306
CD 15.5 20 13.5 49 49 / 324 = 0.15 0.15 x 360 = 54 360
Pie Chart
15%
CD 29%
Bonds
16%
Savings
40%
Stocks
Cumulative
Classes Frequency Relative Frequency
Frequency
0.7312 – 0.7313 - - -
0.7314 – 0.7315 23 - 29
0.7316 – 0.7317 - 0.34 -
0.7318 – 0.7319 17 0.17 -
0.7320 – 0.7321 - - 92
0.7322 – 0.7323 - - -
Total 100 1.00
Question # 10 Find the missing entries in the following frequency distribution table.
Class f Relative Cumulative Cumulative
Limits Frequency Frequency Percentage
8 to – – – – 25
– to – – 0.05 – –
– to – – – 9 –
– to – – 0.30 15 –
– to 32 – – – –
– –
Answer: Here range is 32 – 8 = 24 & no of classes = 5, so h = 24/5 = 4.8 ≈ 5. Thus Class f Relative Frequency Cumulative Frequency Cumulative Percentage
Limits
limits are 8 to 12, 13 to 17, …, 28 to 32. The f against 4th class is 15 – 9 = 6. Let the 8 to – 5 0.25 5 25
total frequency be X. Then 0.30X = 6 gives X = 20. Relative frequency 0.05 gives a f = 1. – to – 1 0.05 6 30
– to – 3 0.15 9 45 34
f = 5,1,3,6,5. Rf = 0.25,0.05,0.15,0.30,0.25. C = 5,6,9,15,20. C%age = 25,30,45,75,100. – to – 6 0.30 15 75
– to 32 5 0.25 20 100
20 1
CRITICAL THINKING PROBLEM
( No # 1 )
WATER-UTILITY COMPANY
35
CRITICAL THINKING PROBLEM
( No # 2 )
MISLEADING GRAPHS
Explain why the graph is misleading. Redraw the graph so that it is not misleading.
2nd quarter
(i) Company Sales 15% Answer: The pie chart
should be
1st 2nd 3rd 4th
displaying all
quarter quarter quarter quarter
four quarters,
not just the first
20% 15% 45% 20% three.
(ii)
Answer When data is
Company Sales taken at
regular
intervals
120
over a period
110 of time, a
time series
100
chart should
90 be used.
3rd 2nd 1st 4th
Quarters
36
Question # 11
Shown here are four frequency distributions. Each is incorrectly constructed. State the
reason why.
(a) Class Frequency (b) Class Frequency
27 – 32 1 5–9 1
33 – 38 0 9 – 13 2
39 – 44 6 13 – 17 5
45 – 49 4 17 – 20 6
50 – 55 2 20 - 24 3
Answer: (a) Class width is not uniform. (b) Class limits overlap, and class width is not uniform. (c) A class has been omitted.
(d) Class width is not uniform. 37
Example # 12:
38
Example # 13:
39
Importance of Statistics in Different Fields
Statistics plays a vital role in every field of human activity. Statistics has important role in
determining the existing position of per capita income, unemployment, population growth
rate, housing, schooling medical facilities etc.…in a country. Now statistics holds a
central position in almost every field like Industry, Commerce, Trade, Physics,
Chemistry, Economics, Mathematics, Biology, Botany, Psychology, Astronomy etc…, so
application of statistics is very wide. Now we discuss some important fields in which
statistics are commonly applied.
Sports
Statistics help evaluate and analyze individual and team performance in various sports.
By tracking and analyzing performance metrics such as goals scored, shooting
percentages, completion rates, and other relevant indicators, coaches and analysts can
identify strengths and weaknesses, make tactical adjustments, and develop game
strategies. Statistics in sports provide a quantitative foundation for analysis, decision
making, and performance improvement. They help stakeholders in sports understand the
game better, evaluate performance objectively, and drive innovation and development in
the sporting world.
40
Importance of Statistics in Different Fields
Business and Economics
Statistics play an important role in business. A successful businessman must be very
quick and accurate in decision making. He knows what his customers want, he should
therefore, know what to produce and sell and in what quantities. Statistics helps
businessman to plan production according to the taste of the costumers, the quality of the
products can also be checked more efficiently by using statistical methods. So, all the
activities of the businessman based on statistical information. Statistical techniques are
widely used in the study of economic problems. The changes in prices cannot be studied
by a common man. The imports and exports, the inflation rate, and the per capita income
are the problems which require good knowledge of statistics.
Insurance
Statistics plays a fundamental role in the insurance industry by providing insights into
risk assessment, underwriting, claims analysis, fraud detection, portfolio management,
pricing, and product development. By utilizing statistical techniques, insurance
companies can make informed decisions, improve profitability, and effectively manage
risks.
41
Importance of Statistics in Different Fields
Banks
Statistics is widely used in banks. All the commercial banks do business with the help of
deposits in the banks. The banks work on the principle that all depositors do not
withdraw their deposits on the same day. The bank earns profits out of these deposits by
lending to others on interest. Statistics enable banks to make data-driven decisions,
manage risks, optimize operations, and provide better services to their customers. It plays
a vital role in various aspects of banking operations, from credit assessment and risk
management to marketing strategies and investment decision-making.
In State Management (Administration)
Statistics facilitates the collection, organization, and analysis of data related to various
aspects of governance, including demographics, economics, public health, education,
crime rates, and more. This data helps policymakers and administrators understand the
current situation, identify trends, and monitor progress over time. Statistics provide a
factual basis for policy formulation and planning. By analyzing statistical data,
policymakers can identify social, economic, and environmental challenges, as well as
opportunities.
42
Importance of Statistics in Different Fields
statistics in state management support evidence-based decision making, policy
formulation, program evaluation, resource allocation, and performance measurement. By
utilizing statistical data and analysis, administrators can effectively address societal
challenges, improve governance, and achieve desired outcomes.
Meteorology
Statistics plays a significant role in the fields of meteorology for weather forecasting.
Meteorologists must collect and analyze data which is complex in nature and is affected
by many factors operating beyond control. Statistical methods help in analyzing effects
of different factors.
Question:
Explain the importance of statistics in different fields.
43
Homework
EXERCISES. (Statistics for Business & Economics, Newbold, 6th Edition)
44