You are on page 1of 48

Lecture 1 – Data Representation

Dr. Salman Saeed


Assistant Professor,
National Institute of Urban Infrastructure Planning (NIUIP)
University of Engineering and Technology, Peshawar

salmansaeed@uetpeshawar.edu.pk
How decisions are made?

• How many primary school to be built? How many rooms?


– Need data on the demographics, education levels
• How wide of a road is needed? How large of a truck terminal required?
– Data on traffic and freight
• Which player to select for team in PSL, WT20 or WC?
– Data on player performance and conditions of venue
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Variables & Cases

Cases

Variables
Right-hand bat Left Hand Bat Right Hand Bat
Leg-break googly Right-arm off-break Right Arm Off-break
6ft 0in 6ft 2in 6ft 4 in
1405 1510 1176
97 17 10
36 Years, 1 month 36 years, 6 months 35 years, 9 months

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Variables & Cases

Cases

Variables

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Variables & Cases

Case Case

Characteristics Characteristics

Variables Need to change/vary Variables

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Variables – Whose values change across cases

Variation

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Constants – Whose values do not vary across cases

No Variation (Constants)

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Levels of Measurements
Similar
Difference Order
Intervals
Nominal ✓ 

• Using symbols to classify observation into categories, that are:


– mutually exclusive
– Exhaustive
• Mutually exclusive means that the categories must be distinct enough that no
observations will fall into more than one category.
• Exhaustive means that there must be enough categories that all the observations
will fall into some category.
• Examples, Country, city, types of concrete, color of something etc.
• Categories do not have an explicit relation ship so they cannot be sorted, for
example , we cannot say that one category is bigger than the other.
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Levels of Measurements
Similar
Difference Order
Intervals
Nominal ✓ 
Categorical
Ordinal ✓ ✓

• Using symbols to classify observation into categories, that are:


– mutually exclusive
– Exhaustive
– the categories have some explicit relationship among them.
• For example, taller and shorter, greater and lesser, faster and slower, harder and
easier, and so forth.
• Each observation must still fall into one of the categories (exhaustive) but no more
than one (mutually exclusive).
• Ranks in bureaucracy, military etc., Levels of satisfaction, happiness, comfort etc.
• Both Nominal and Ordinal variables are called categorical variables.
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Levels of Measurements
Similar
Difference Order
Intervals
Nominal ✓  
Categorical
Ordinal ✓ ✓ 

• No known interval between categories


• No way of saying how much is the difference between two consecutive variables.
• For example, no meaningful difference between SDO and XEN, captain and major,
orange and blue, less happy and more happy, less comfortable and more
comfortable.

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Levels of Measurements
Similar
Difference Order
Intervals
Nominal ✓  
Categorical
Ordinal ✓ ✓ 
Interval ✓ ✓ ✓
• Using numbers to measure observations, that are:
– mutually exclusive
– exhaustive
– have some explicit relationship among them
– the relationship between the categories is known and exact
• A common and constant unit of measurement has been established between the
categories
• Examples: Temperature, IQ

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Levels of Measurements
Similar Meaningful
Difference Order
Intervals Zero Point
Nominal
✓   
Categorical
Ordinal
✓ ✓  
Interval
✓ ✓ ✓ 

• Numerical values without a true zero point.


• Intervals between the values are equal and meaningful, but the numbers themselves
are arbitrary.
• 0 value does not indicate a complete lack of the quantity, like a zero temperature
does not mean no temperature.

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Levels of Measurements
Similar Meaningful
Difference Order
Intervals Zero Point
Nominal
✓   
Categorical
Ordinal
✓ ✓  
Interval
✓ ✓ ✓ 
Quantitative
Ratio
✓ ✓ ✓ ✓
• Using numbers to measure observations, that are:
– mutually exclusive
– exhaustive
– have some explicit relationship among them
– the relationship between the categories is known and exact
– There is a meaningful zero point. The numbers originate from a specified point.
• Example, weight, area, speed, length etc.

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Levels of Measurements
Similar Meaningful
Difference Order
Intervals Zero Point
Nominal
✓   
Categorical
Ordinal
✓ ✓  
Interval
✓ ✓ ✓ 
Quantitative
Ratio
✓ ✓ ✓ ✓

– Variables that can be counted


– Always in whole numbers
– Two children, five teams, eleven players etc.

– Variables that cannot be counted


– Represent a continuous interval with infinite values
– Height of 2cm, 5 cm, or 3.45cm, 3.4598 cm, 3.4597654321 cm

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Levels of Measurements

Levels of Measurement

Statistical Methods
• Sometimes the distinctions between levels of measurement get blurred
• An ordinal variable with 10 categories or more is allowed to be used as an interval
variable if the categories are named as numbers
• Example: Ratings of a player
• Similarly, interval variables are sometimes treated as ratio variables
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Data Presentation
Data Presentation
Study

Data

Variables Cases

Order and Presentation

Example: A study of all cricket players in KP


16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Data Presentation
Example: A study about cricket players in KP

Variables

Age, Weight, Runs


Scored, and Bowling Style

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Data Matrix
Variables
Age Weight Bowling Style Average
Player 1 23 65 RAF 45.76
Player 2 27 75 LAF 42.54
Player 3 25 81 LAM 22.84
Player 4 34 69 LAS 38.78
Player 5 33 73 RAS 27.4
Player 6 24 72 RAM 20.3
Player 7 24 85 RAF 44.36
Player 8 20 71 LAF 29.42
Player 9 19 71 LAM 21.09
Player 10 27 79 LAS 42.24
Player 11 28 69 RAS 45.25
Player 12 33 79 RAM 44.16
……. - - - -
……. - - - -
……. - - - -
Player 400 23 81 RAM 45.98
What kind of variables do we have in each column?
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Data Matrix
Variables
Age Weight Bowling Style Average
Player 1 23 65 RAF 45.76
Player 2 27 75 LAF 42.54
Player 3 25 81 LAM 22.84
Player 4 34 69 LAS 38.78
Player 5 33 73 RAS 27.4
Player 6 24 72 RAM 20.3
Player 7 24 Observation
85 RAF 44.36
Player 8 20 71 LAF 29.42
Player 9 19 71 LAM 21.09
Player 10 27 79 LAS 42.24
Player 11 28 69 RAS 45.25
Player 12 33 79 RAM 44.16
……. - - - -
……. - - - -
……. - - - -
Player 400 23 81 RAM 45.98

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Data Matrix
Variables
Age Weight Bowling Style Average
Player 1 23 65 RAF 45.76
Player 2 27 75 LAF 42.54
Player 3 25 81 LAM 22.84
Player 4 34 69 LAS 38.78
Player 5 33 73 RAS 27.4
Player 6 24 72 RAM 20.3
Player 7 24 85 RAF 44.36
Player 8 20 71 LAF 29.42
Player 9 19 71 LAM 21.09
Player 10 27 79 LAS 42.24
Player 11 28 69 RAS 45.25
Player 12 33 79 RAM 44.16
Player 13 31 76 RAM 23.40
Player 14 32 74 RAF 32.40
Player 15 32 66 LAF 25.00
Player 16 29 77 RAM 49.20

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Data Matrix
Variables
Data Matrix Age Weight Bowling Style Average
Player 17 29 80 RAF 36.70
Player 18 37 77 LAF 45.90
Player 19 33 85 LAM 45.40
Player 20 28 85 LAS 44.60
Player 21 24 80 RAS 30.30
Player 22 37 74 RAM 32.60
Player 23 25 67 RAF 51.10
Player 24 18 78 LAF 41.00
Player 25 35 75 LAM 28.80
Player 26 23 66 LAS 42.80
Player 27 24 75 RAS 28.40
Player 28 24 69 RAM 42.90
Player 29 34 66 RAM 34.80
Player 30 18 83 RAF 43.30
Player 31 18 81 LAF 25.20
Player 32 30 73 RAM 38.80

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Data Matrix
Variables
Age Weight Bowling Style Average
Player 33 38 78 RAF 28.50
Player 34 19 72 LAF 45.80
Player 35 18 72 LAM 28.40
Player 36 25 72 LAS 27.40
Player 37 24 75 RAS 50.00
Player 38 31 - RAM 52.00
Player 39 32 83
May be Eliminated in RAF 36.60
Player 40 32 subsequent81Analysis - 27.70
Player 41 27 80 LAM 43.40
Player 42 - 72 LAS 47.30
Player 43 29 71 RAS 53.30
Player 44 20 81 RAM 31.40
Player 45 36 72 RAM 40.50
Player 46 23 69 RAF 28.40
Player 47 24 84 LAF 42.00
Player 48 38 67 RAM 32.90

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Data Matrix Age Weight Runs Average
Player 1 35 80 4868 17.77

Data Matrix Player 2


Player 3
Player 4
21
20
17
69
81
84
1735
5135
4052
22.64
41.33
25.92
Player 5 29 80 1392 25.57
Player 6 18 84 5115 33.15
Player 7 36 85 4172 24.99
Player 8 37 66 4910 15.46
Player 9 38 77 3708 33.83
Player 10 37 69 2158 28.45
Player 11 33 83 2277 40.13
Player 12 30 66 3885 42.28
Player 13 18 77 2470 26.56
Player 14 27 81 1649 16.26
Player 15 36 69 1178 31.51
Player 16 25 85 5055 39.08
Player 17 18 72 3144 24.42
Player 18 25 84 4547 21.10
Player 19 37 75 3458 42.88
Player 20 37 75 3243 17.39
Player 21 32 71 2513 45.30
Player 22 36 77 2346 18.32
Player 23 32 76 4125 24.91
Player 24 20 76 1708 41.31
Player 25 18 76 3133 20.27
Player 26 18 68 5295 23.69
Player 27 36 77 4466 22.08

All Statistical Analysis Player 28


Player 29
Player 30
Player 31
26
29
28
37
74
76
67
83
4279
2323
4969
5397
16.66
17.64
25.92
42.68
Player 32 23 66 4394 28.20
Player 33 22 76 5317 35.25
Player 34 27 84 3069 41.86
Player 35 30 73 4028 20.64
Player 36 26 81 1774 15.07
Player 37 21 80 2419 40.85
Player 38 35 85 4447 16.95
Player 39 20 78 5202 22.33

Huge
Player 40 23 76 4950 21.06
Player 41 27 81 2423 19.69
Player 42 19 78 3224 18.73
Player 43 28 69 3712 43.20
Player 44 20 76 4193 23.25
Player 45 21 71 4302 42.88
Player 46 26 72 4242 23.42
Player 47 25 77 4031 32.56
Player 48 28 72 4153 17.73
Player 49 21 78 4522 37.81
Player 50 37 85 4377 37.53
Player 51 30 84 1445 38.94
Player 52 36 71 1912 18.29
Player 53 24 69 4873 41.22
Player 54 24 85 4484 18.88
Player 55 20 67 1175 28.13
Player 56 36 70 5297 18.09

Summaries of Data Player 57

Player 59
27

Very little
Player 58 19
31
67
65
73
3281
3647
3163
29.52
37.57
20.78
Player 60 27 76 3408 36.79
Player 61 28 69 4833 38.25

information
Player 62
Player 63
Player 64
Player 65
37
21
24
24
83
81
84
81
4972
3981
1240
3833
44.62
34.77
35.17
19.55

or insight
Player 66
Player 67
Player 68
Player 69
24
22
28
32
85
72
84
68
4308
3440
1910
3453
21.34
40.24
36.71
40.14
Player 70 21 85 5130 28.86
Player 71 21 77 1728 21.71
Player 72 24 74 3823 33.97
Player 73 36 70 3950 33.79
Player 74
Player 75
Player 76
Hidden
17
35
32
77
72
83
3813
1590
4015
38.51
33.22
31.66
Player 77 20 82 3612 43.33

information
Player 78
Player 79
Player 80
28
27
36
78
66
76
4474
4528
3853
24.92
19.95
17.93
Player 81 24 79 5376 40.34

in the data
Player 82
Player 83
25
37
78
73
4557
5375
19.09
18.64

Tables and Graphs Player 84


Player 85
Player 86
33
20
33
74
72
82
5234
2984
3792
22.30
30.98
36.35
Player 87 17 66 1856 26.51
Player 88 32 77 1691 27.57
Player 89 32 69 3772 29.11
Player 90 33 79 3089 44.15
Player 91 23 84 2111 18.72
Player 92 18 77 3392 21.08

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Frequency Table
• Frequency Tables show how values are distributed over the cases
• First we list all the possible values of the variable, and in the next column we count how many cases
have those values
Bowling Style Frequency
Left Arm Fast 66
Left Arm Medium 119
Right Arm Fast 140
Right Arm Medium 25
Other 50
Total 400

• Sum of 400 means that we don’t have any missing data for this variable

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Frequency Table
• Frequency Tables show how values are distributed over the cases

Bowling Style Frequency Percentage


Left Arm Fast 66 16.5
Left Arm Medium 119 29.75
Right Arm Fast 140 35
Right Arm Medium 25 6.25
Other 50 12.5
Total 400 100

• We get percentages by dividing each frequency value by the total, and multiply by 100

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Frequency Table
• Frequency Tables show how values are distributed over the cases

Bowling Style Frequency Percentage Cumulative Percentage


Left Arm Fast 66 16.5 16.5
Left Arm Medium 119 29.75 46.25
Right Arm Fast 140 35 81.25
Right Arm Medium 25 6.25 87.5
Other 50 12.5 100
Total 400 100

• Cumulative Percentage is simply the sum of all frequency percentages above it + it’s own value.

• The first value has nothing above it, so it will have the same value.
• The next value will be sum of above values (16.5) plus its own (29.75), so (16.5+29.75 = 46.25)
• Similarly, the next value will be (16.5+29.75+35=81.25)
• Notice that the last value will always be 100.

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Frequency Table
• Frequency Tables work well with Categorical data, i.e. ordinal or nominal.
• Let’s see how it would look like with quantitative data
• We know the first step in frequency table is to make a list of all possible values of the variable

Weight Frequency Percentage Cumulative Percentage


65 3
66 4
67 3
68 6
69 2
70 0
71 2
72 1
--- ---
96 1
Total 400 100

We can see the problem here – a very large list, and each value has very low frequency compared to
the total – hence, the frequency table won’t give us the insights that we are looking for.

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Frequency Table
• The previous example was discrete data.
• Now let’s try the same with continuous data

Average Frequency Percentage Cumulative Percentage


22.11 0
22.12 1
22.13 0
22.14 0
22.1456 0
22.14564 0
22.156 0
22.79 0
--- ---
--- 1
Total 400 100

Do you get the picture?


There are infinite values in continuous data, we can never make a list of all possible values.

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Frequency Table
• This problem is solved by converting quantitative data into categorical data – by using intervals
• For this, we divide total range of values into categories that we can manage.

Weight Frequency Percentage Cumulative Percentage


65 – 70
71 – 75
76 – 80
81 – 85
86 – 90
91 – 95
Above 95
Total 400 100

In this way we also reduce the number of rows in our frequency table.

The same could be done with continuous data

Quantitative variables can always be re-coded into Categorical variables, but the reverse is not possible

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Graphs
• Frequency tables are great but looking at numbers
usually do not reveal the important information specially
if you’re not paying good attention

• Besides, people interpret information better with


graphical representation compared to tables

• For example, did you notice which bowling style was


most prevalent amongst the players of KP?

• Did you notice which bowling style was not very


popular?

• Let’s see the same information in graphs.


16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Pie Chart
Bowling Style Frequency Percentage Cumulative Percentage
Left Arm Fast 66 16.5 16.5
Left Arm Medium 119 29.75 46.25
Right Arm Fast 140 35 81.25
Right Arm Medium 25 6.25 87.5
Other 50 12.5 100
Total 400 100
Frequency

Left Arm Fast Left Arm Medium Right Arm Fast Right Arm Medium Other
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Bar Graph
Bowling Style Frequency Percentage Cumulative Percentage
Left Arm Fast 66 16.5 16.5
Left Arm Medium 119 29.75 46.25
Right Arm Fast 140 35 81.25
Right Arm Medium 25 6.25 87.5
Other 50 12.5 100
Total 400 100
Frequency
160
140
120
100
80
60
40
20
0
Left Arm Fast Left Arm Medium Right Arm Fast Right Arm Medium Other
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Pie Chart vs Bar Graph
• Its easy to see in a pie chart that around 50% of players are Left Arm bowlers, or about 30% are
Right Arm Fast bowlers – We can’t deduce that information from bar graphs

• In bar graphs we can see the total number of players with Right Arm fast bowling, but we can’t
read that in the pie chart

Frequency Frequency
160
140
120
100
80
60
40
20
Left Arm Fast Left Arm Medium 0
Right Arm Fast Right Arm Medium Left Arm Left Arm Right Arm Right Arm Other
Other Fast Medium Fast Medium
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Pie Chart vs Bar Graph
• When the number of categories is high, then pie charts
do not reveal much information or insight

• While
Frequency Frequency
120

100

80

60

40

20

Category 1 Category 2 Category 3 Category 4 Category 5 Category 6 0


Category 7 Category 8 Category 9 Category 10 Category 11 Category 12
Category 13 Category 14 Category 15 Category 16 Category 17 Category 18
Category 19 Category 20 Category 21

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Graphs
Summarize

Data

Categorical Quantitative

Pie Charts Dot plots


Bar Graphs

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Dot Plots
Player Physical Height (cm)
Player 1 199
Player 2 185
Player 3 158
Player 4 164
Player 5 191
Player 6 187
Player 7 176
Player 8 194
Player 9 184
Player 10 180

Dot Plot

155 160 165 170 175 180 185 190 195 200
Physical Height (cm)
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Dot Plots
Player Physical Height (cm)
Player 1 199
Player 2 185
Player 3 158
Player 4 164
Player 5 191
Player 6 187
Player 7 176
Player 8 194
Player 9 184
Player 10 180

Dot Plot

155 160 165 170 175 180 185 190 195 200
Physical Height (cm)
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Dot Plots
Player Physical Height (cm)
Player 1 199
Player 2 185
Player 3 158
Player 4 164
Player 5 191
Player 6 187
Player 7 176
Player 8 194
Player 9 184
Player 10 180

Dot Plot

155 160 165 170 175 180 185 190 195 200
Physical Height (cm)
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Dot Plots
Player Physical Height (cm)
Player 1 199
Player 2 185
Player 3 158
Player 4 164
Player 5 191
Player 6 187
Player 7 176
Player 8 194
Player 9 184
Player 10 180

Dot Plot

155 160 165 170 175 180 185 190 195 200
Physical Height (cm)
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Dot Plots
Dot Plot for 150 cases
13
12
11
10
9
8
7
6
5
4
3
2
1
0

155 160 165 170 175 180 185 190 195 200
Physical Height (cm)

• Number of dots above a value is the frequency of that value in the data matrix

• Such a plot looks good for discrete data, but imagine what it would look like if it was continuous
data

• First of all, there is zero probability that an exact value of a continuous variable will be repeated

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Histograms
Dot Plot for 150 cases
13
12
11
10
9
8
7
6
5
4
3
2
1
0

155 160 165 170 175 180 185 190 195 200
Physical Height (cm)
• Histograms can be used for both discrete as well as continuous data

• There are two transformations required going from dot plots to histograms
– First, dots are replaced with bars

– Second, the width of the bar doesn’t represent a value, but a range.

– The height of the bar represents the frequency of values within that range

– Each individual ranges are called, ‘bins’

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Histograms
Dot Plot
13
12
11
10
9
8
7
6
5
4
3
2
1
0

155 160 165 170 175 180 185 190 195 200
Physical Height (cm)

• Histograms can be used for both discrete as well as continuous data

• There are two transformations required going from dot plots to histograms
– First, dots are replaced with bars

– Second, the width of the bar doesn’t represent a value, but a range.

– The height of the bar represents the frequency of values within that range

– Each individual ranges are called, ‘bins’

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Histograms
Dot Plot
13
12
11
10
9
8
7
6
5
4
3
2
1
0
168

173

196
155
156
157
158
159
160
161
162
163
164
165
166
167

169
170
171
172

174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195

197
198
199
200
Physical Height (cm)

• Histograms can be used for both discrete as well as continuous data

• There are two transformations required going from dot plots to histograms
– First, dots are replaced with bars

– Second, the width of the bar doesn’t represent a value, but a range.

– The height of the bar represents the frequency of values within that range

– Each individual ranges are called, ‘bins’

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Histograms vs Frequency Charts
Histogram for 150 cases
13
12 Underlying Continuous Scale
11
10
9
8
7
6
5
4
3
2
1
0
165

173

181

189
155
156
157
158
159
160
161
162
163
164

166
167
168
169
170
171
172

174
175
176
177
178
179
180

182
183
184
185
186
187
188

190
191
192
193
194
195
196
197
198
199
200
Physical Height (cm)
Frequency Chart for 150 cases
13
12 Underlying Discrete Scale
11
10
9
8
7
6
5
4
3
2
1
0
162
155
156
157
158
159
160
161

163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
Physical Height (cm)
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Histogram bin variation
Histogram – bin size = 1
13
12
11
10
9
8
7
6
5
4
3
2
1
0
165

173

181

189
155
156
157
158
159
160
161
162
163
164

166
167
168
169
170
171
172

174
175
176
177
178
179
180

182
183
184
185
186
187
188

190
191
192
193
194
195
196
197
198
199
200
Physical Height (cm)
Histogram – bin size = 1
35
30
25
20
15
10
5
0
155

162

167
156

157

158

159

160

161

163

164

165

166

168

169

170

171

172

173
Physical Height (cm)
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Histogram bin variation
Histogram – bin size = 2.5
35
30
25
20
15
10
5
0

197.5
175
155

157.5

160

162.5

165

167.5

170

172.5

177.5

180

182.5

185

187.5

190

192.5

195
Physical Height (cm)
Histogram – bin size = 5
60
50
40
30
20
10
0
165

190
155

160

170

175

180

185

195
Physical Height (cm)
16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Shape of the Histogram
13
12
11
10
9
8
7
6
5
4
3
2
1
0
155

176

197
158
161
164
167
170
173

179
182
185
188
191
194

200
Physical Height (cm)

16-Jul-2020 Lecture # 01 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar

You might also like