You are on page 1of 15

COMP-SCI 5590-0012

Econometrics of data science

LECTURE 1-4: DESCRIPTIVE STATISTICS


Rezwana Rafiq
Adjunct Instructor, University of Missouri-Kansas City
Assistant Project Scientist, University of California Irvine
Recap
▪ Descriptive statistics consists
of methods for organizing,
displaying, and describing data
by using tables, graphs, and
measures.

Median speed before the law = 57.87 mph


Median speed after the law = 59.45 mph

2
Recap
▪ Some methods and techniques are available to summarize and interpret the data.
• Point estimates: these are the single values of the sample statistics/estimates
that are used to estimate the population parameters.
• Graphical presentations: commonly used graphical representation of data
(e.g., boxplots, histogram, bar chart)

3
RECAP
▪ Point estimates
▪ Measures of central tendency: arithmetic mean, median, mode
▪ Measures of variability: variance, standard deviation, range
▪ Measures of position: quartiles, interquartile range, percentile
▪ Measures of distribution: skewness, kurtosis
▪ Measures of association: covariance, correlation
▪ Graphical representation

* Properties of estimators

4
Methods of Displaying Data
▪ Histogram
▪ Boxplot
▪ Scatter plot
▪ Bar chart
▪ Pie chart

5
Methods of Displaying Data: Histogram
▪ A histogram is a chart in which classes are marked on the horizontal axis and the
frequencies, relative frequencies, or percentages are marked on the vertical axis.
▪ The height of each bar is proportional to the frequency of values in the class
represented by the bar. Bars are drawn adjacent to each other.
▪ Histograms are useful for uncovering asymmetries in data.
Categories frequency relative freq.
5-9 3 3/30 = 0.10
10-14 6 6/30 = 0.20
15-19 8 8/30 = 0.267
20-24 8 8/30 = 0.267
25-29 5 5/30 = 0.167

5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30

Fig. source: Mann, P.S. (2011) Introductory Statistics

6
Methods of Displaying Data: Histogram
Travel time from home to university (in minutes)
6 9 11 15 21 24 29 26 26 16 17 16 18 20 22 23 28 5 ……..

Categories frequency relative freq.


5-10 3 3/30 = 0.10
10-15 6 6/30 = 0.20
15-20 8 8/30 = 0.267
20-25 8 8/30 = 0.267
25-30 5 5/30 = 0.167

5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30

Fig. source: Mann, P.S. (2011) Introductory Statistics

7
Methods of Displaying Data: Histogram
Speed data (in miles per hour)
55 61 66 59 45 39 70 65 68 69 58 79 65………..

8
Methods of Displaying Data: Box and Whisker Plot
▪ This plot shows the center, spread, skewness, and outliers in data.
▪ It is constructed by drawing a box and two whiskers that use the median, the first
quartile, the third quartile and the smallest and largest values in the data set between
the lower and upper inner fences. If the data set includes one or more outliers, they
are plotted separately as points on the chart.

Fig. source: Washington, S. et al. (2020) Stat. and Econ. Methods for Transportation Data Analysis

9
Methods of Displaying Data: Box and Whisker Plot
Age (30+ years)
Missouri: 55 61 66 59 45 39 70 65 68 69 58 79 65………..
Kansas: 30 41 56 59 35 39 40 65 61 70 55 67 62………..

10
Methods of Displaying Data: Box and Whisker Plot
▪ HH income in 000 dollars for 12 households
▪ 69 74 75 79 81 84 90 94 98 104 112 144
Step 1: First rank the data in increasing order and calculate quartile values
Median = (84+90)/2 = 87 Q1= (75+79)/2 = 77
Q3 = (98+104)/2 = 101 IQR = Q3 – Q1 = 24

Step 2: Find the points that are 1.5xIQR below Q1 and 1.5xIQR above Q3.
1.5xIQR = 1.5 x 24 = 36
Lower inner fence = Q1 – 36 = 77 – 36 = 41
Upper inner fence = Q3 + 36 = 137

Step 3: Determine the smallest and the largest values within the two inner fences.
Smallest value within two inner fences = 69
Largest value within two inner fences = 112

Step 4: Draw the boxplot and show the outliers. Fig. source: Mann, P.S. (2011) Introductory Statistics

11
Methods of Displaying Data: Scatter Plot
▪ A plot of paired observations is called a scatter plot.

Fig. source: Mann, P.S. (2011) Introductory Statistics

12
Methods of Displaying Data: Bar Chart
▪ A chart made of bars whose heights represent the frequencies of respective
categories is called a bar chart.

Fig. source: Mann, P.S. (2011) Introductory Statistics

13
Difference between bar chart and histogram
▪ Histogram generally shows frequency distribution of continuous variables, but a bar
chart shows comparison of discrete values of a variable.
▪ Histogram presents numerical data whereas bar chart can show either numerical or
categorical data.

Fig. source: Google.com

14
Methods of Displaying Data: Pie Chart
▪ A circle divided into portions that represent the relative frequencies or percentages
of a population or a sample belonging to different categories is called a pie chart.

Fig. source: Mann, P.S. (2011)


Introductory Statistics

Distribution of transit trips (N = 10,000) by


activity purposes using 2017 NHTS data

15

You might also like