Professional Documents
Culture Documents
1 2
Real Estate Development. To apply that knowledge to survey, analyze, evaluate, and
manage systems and processes in the construction industry.
3 4
3 4
6/1/2020
5 6
Content Content
Part 1: Exploring and understanding data Part 2: Exploring relationships between variables
1.1 What are statistics? 2.1 Scatter plots
1.2 Populations and samples 2.2 Correlation
1.3 Variables 2.3 Linear regression
1.4 Measures of data 2.4 Multiple regression
1.5 Pattern of data
1.6 Displaying and describing data Part 3: Data collection
1.7 Tables 3.1 Three big ideas of sampling
1.8 Comparing distributions 3.2 Methods of data collection
3.3 Scales of measurement
3.4 Sample surveys
7 8
7 8
6/1/2020
9 10
9 10
11 12
11 12
6/1/2020
13 14
13 14
15 16
15 16
6/1/2020
17 18
17 18
19
Source: Dr. Le Hoai Long’s lecture note 20
19 20
6/1/2020
21 22
23 24
6/1/2020
25 26
25 26
27 28
27 28
6/1/2020
29 30
29 30
31 32
31 32
6/1/2020
33 34
33 34
35 36
35 36
6/1/2020
37 38
where:
• s2 is the sample variance, Statisticians often use simple random samples to estimate
the standard deviation of a population, based on sample data
• 𝑥̅ is the sample mean,
the best estimate of the standard deviation of a population:
• xi is the ith element from the sample,
• and n is the number of elements in the sample. ∑ ̅
s= 𝑠 =
The sample variance can be considered an unbiased estimate 39 40
41 42
41 42
43 44
43 44
6/1/2020
45 46
47 48
47 48
6/1/2020
49 50
49 50
51
Source: Dr. Le Hoai Long’s lecture note 52
51 52
6/1/2020
Source: Dr. Le Hoai Long’s lecture note 53 Source: Dr. Le Hoai Long’s lecture note 54
53 54
55 56
55 56
6/1/2020
1.6 Displaying and describing data 1.6 Displaying and describing data
1.6.2 Histograms 1.6.2 Histograms (cont.)
Like a bar chart, a histogram is made up of columns plotted on
a graph. Usually, there is no space between adjacent columns.
Here is how to read a histogram.
The columns are positioned over a label that represents a
continuous, quantitative variable.
The column label can be a single value or a range of values.
The height of the column indicates the size of the group
defined by the column label.
57 58
57 58
1.6 Displaying and describing data 1.6 Displaying and describing data
The Difference Between Bar Charts and Histograms 1.6.3 Pie Charts
With bar charts, each column represents a group defined by Pie charts display all the cases as a circle whose slices have
a categorical variable; areas proportional to each category’s fraction of the whole.
and with histograms, each column represents a group Pie charts give a quick impression of the distribution.
defined by a continuous, quantitative variable. Because we’re used to cutting up pies into 2, 4, or 8 pieces,
it is appropriate to talk about the skewness of a histogram. pie charts are particularly good for seeing relative
frequencies near 1>2, 1>4, or 1>8.
How about a bar chart?
59 60
59 60
6/1/2020
1.6 Displaying and describing data 1.6 Displaying and describing data
1.6.3 Pie Charts 1.6.4 Dotplot
Bar charts are almost always A dotplot is made up of dots plotted on a graph.
better than pie charts for Each dot can represent a single observation or a specified
comparing the relative number of observations.
frequencies of categories. The dots are stacked in a column over a category
Pie charts are widely If the categories are quantitative, the pattern of data in a
understood and colorful, and dotplot can be described in terms of symmetry and skewness
they often appear in reports
Dotplots are used most often to plot frequency counts within
problem??.
a small number of categories, usually with small sets of data.
61 62
61 62
1.6 Displaying and describing data 1.6 Displaying and describing data
1.6.4 Dotplot 1.6.5 Stemplots
A stemplot (aka, stem and leaf plot) is a type of chart that shows
how individual values are distributed within a set of data.
A stemplot is used to display quantitative data, generally from
small data sets (50 or fewer observations).
The entries on the left are called stems; and the entries on
the right are called leaves.
Stemplots usually do not include explicit labels for the stems
and leaves
63 64
63 64
6/1/2020
1.6 Displaying and describing data 1.6 Displaying and describing data
1.6.5 Stemplots (cont.) 1.6.6 Boxplot Basics
A boxplot splits the data set into quartiles.
The body of the boxplot consists of a "box", which goes from
the first quartile (Q1) to the third quartile (Q3).
Within the box, a horizontal line is drawn at the Q2,
the median of the data set.
Two vertical lines, called whiskers, extend from the up and
bottom of the box. The up whisker goes from Q3 to the
largest non-outlier in the data set, and the bottom whisker
goes from Q1 to the smallest non-outlier.
If the data set includes one or more outliers, they are plotted
separately as points on the chart.
65 66
65 66
67 68
67 68
6/1/2020
69 70
71 72
6/1/2020
73
73