You are on page 1of 38

PROBABILITY AND STATISTICS

LECTURE 1
INTRODUCTION &
DESCRIPTIVE STATISTICS I

Adapted from http://www.prenhall.com/mcclave


OUTLINE

1. Introduce Course Objectives and Requirements


2. Introduce Statistics and its Applications
3. Describe Types of Data, Ways of Collecting Data
4. Introduce Graphical Methods for Describing Data
of 1 variable
COURSE OBJECTIVES
 This course aims to:
 Introduce basic probabilistic and statistical methods for
data analysis
 Discuss how to apply statistical techniques in practical
situations
 Enable students to work on a small-scale project to gain
data analysis experience.
 Provide practical sessions where students learn how to
use a statistical software
COURSE OUTLINE

• Read your course outline carefully. It contains


important information regarding the course
schedule, assessment, etc.
• Assessment: Attendance (10%), Project (20%),
Midterm (20%)
• Eligibility for final exam: follow faculty
regulations
STUDY MATERIALS

 Study materials: lecture notes, required


readings, tutorial exercises, etc.
 Please note that the materials will be posted
and/or updated on Google classroom.
 Google classroom is important for class
management, notifications, handling questions,
study materials, etc.
RECOMMENDED STUDY STRATEGIES

• Discuss lecture notes with your peers. Review your lecture


notes weekly
• Attempt all the tutorial exercises before class
• Participate in class: ask questions, contribute ideas,
volunteer to present solutions, take notes, etc.
• Read the assigned readings in textbook
• Work actively on your project and the labs
• Do further reading if you have time
WHAT IS STATISTICS?

1. Collecting Data Data


Analysis Question?
2. Presenting Data
3. Characterizing Data © 1984-1994 T/Maker Co.

Decision-
4. Other activities:
Making
estimation,
hypothesis testing…

© 1984-1994 T/Maker Co.


WHAT IS STATISTICS?

• Data: facts and figures collected


• Statistics is the science of data. It involves
collecting, classifying, summarizing, organizing,
analyzing, and interpreting data
STATISTICAL METHODS

Statistical
Methods

Descriptive Inferential
Statistics Statistics
DESCRIPTIVE STATISTICS

1. Involves
 Summarizing
Data
 Presenting Data
 Looking for
patterns in data

2. Purpose Source: https://chartio.com/learn/charts/grouped-bar-chart-complete-guide/

 Describe Data

X = 30.5 S2 = 113
INFERENTIAL STATISTICS

1. Involves
Population?
 Estimation
 Hypothesis
Testing
 …

2. Purpose
 Make Decisions About
Population Characteristics
STATISTICAL METHODS

1- Descriptive Statistics utilizes numerical and


graphical methods to look for patterns in a data
set, to summarize the information revealed in a
data set and to present the information in a
convenient form.

2- Inferential Statistics utilizes sample data to


make estimates, decisions, predictions, or other
generalizations about a larger set of data.
BASIC DEFINITIONS AND
CONCEPTS
1. Population: is a set of ALL units (usually people,
objects, or events) that we are interested in studying
Example:
2. Variable: is a characteristic or property of an
individual population unit. A variable can take on
different values.
Example:
BASIC DEFINITIONS AND
CONCEPTS (CONT.)
3. Sample: is a subset of the units of a population
Example:

4. Measurement: assigning numbers to objects

5. Observation:
EXAMPLE

If we are interested in the number of hours per day


the average high school Vietnamese student spends
sending text messages? Work in pairs and answer the
following questions:
 What would be the population?
 Why would a sample be necessary?
 What could be the variables of interest?
DATA TYPES

• Why data types are important?


• Basic types of data:
 Quantitative (numerical) versus Qualitative
(categorical)
• Examples
• Are there numbers that are not quantitative?
SCALES OF MEASUREMENT

Ratio
Quantitative

Interval

Ordinal
Qualitative

Nominal
DATA SOURCES

 Published sources
 Experimental studies
 Attempt to control factors that affect variable of interest
 Observational studies (including survey)
 Do not attempt to control factors that affect variable of
interest
APPLICATION AREAS
Economics Engineering
Forecasting Construction
Demographics Materials
Business Many other areas
Consumer Research,
Preferences Information
Financial Trends Technology,
Psychology, etc.
DESCRIPTIVE STATISTICS

Descriptive Statistics

Graphical Numerical
Methods Methods

We will now explore graphical methods


GRAPHICAL METHODS FOR 1
CATEGORICAL VARIABLE

1 categorical
variable

Tabulating Data Graphing Data

Frequency
Distribution Bar Pie Pareto
Table Chart Chart Diagram
SOME TERMS
1- A class is one of the categories into which
qualitative data can be classified.
2- The class frequency is the number of
observations in the data set falling into a
particular class. (Frequency = Count).
3- The class relative frequency is the class
frequency divided by the total number of
observations in the data set.
FREQUENCY DISTRIBUTION TABLE
Example Q: Identify the variable and its type?
Relative
Class Frequency Frequency
Highest Degree Number of
Obtained CEOs Proportion
None 1 0.04
Bachelors 7 0.28
Masters 11 0.44
Doctorate/Law 6 0.24
Total 25 1
ANOTHER EXAMPLE

Q: Identify the variable and its type?

Row Is
Category Major Count
Accounting 130
Economics 20
Management 50
Total 200
BAR CHART
Note that we may wish to use a vertical bar chart
Horizontal Major Bar Length
Bars for Shows
Categorica Frequency
l Variables
Mgmt. or %

Equal Bar
Econ. Widths
1/2 to 1
Bar Width
Acct.

Zero Point 0 50 100 150


Percent Used Also Frequency
PIE CHART

1. Shows Breakdown of
Majors
Total Quantity into
Categories Mgmt.
Econ. 25%
2. Angle Size 10% 36°
 (360°)(Percent)
Acct.
65%
(360°) (10%) = 36°
SMALL DISCUSSION

When to use bar chart or pie chart?


GRAPHICAL METHODS FOR 1
NUMERICAL VARIABLE

1 numerical
variable

Frequency Stem-and-Leaf
Display
Distribution

Histogram
STEM-AND-LEAF DISPLAY

Divide Each Observation into Stem Value and Leaf Value


 Stem Value Defines Class
 Leaf Value Defines Frequency (Count)

Source: https://www.mathsisfun.com/data/stem-leaf-plots.html

Sample size for stem-and-leaf display: 10-50


EXAMPLE: HUDSON AUTO REPAIR

 The manager of Hudson Auto would like to have a


better understanding of the cost of parts used in the
engine tune-ups performed in her shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed below.
FREQUENCY DISTRIBUTION TABLE STEPS

 1. Determine Range
 2. Select Number of Classes
 Usually Between 5 & 15 Inclusive
 3. Compute Class Width
 4. Determine Class Boundaries (Limits)
 5. Compute Class Midpoints
 6. Count Observations & Assign to Classes
FREQUENCY DISTRIBUTION

Parts Note: in this


Cost ($) Frequency Percent example,
2 4 class 50-60
50-60
means parts
60-70 13 26
costs from
70-80 16 32 $50 up to but
80-90 7 14 not including
90-100 7 14 $60. Other
5 10 classes can
100-110
be interpreted
50 100
similarly.
HISTOGRAM EXAMPLE
Tune-up Parts Cost
18
16
14
12
Frequency

10
8
6
4
2
Parts
50 60 70 80 90 100 110 Cost ($)
MORE ON HISTOGRAM

•Exercise: describe the characteristics of the histogram


in the previous slide
•Shapes of histogram

•Advantages of histogram

•Limitations of histogram

•Sample size consideration: n at least 20;


recommended: n at least 50.
DESCRIBING DISTRIBUTIONS
DESCRIBING DISTRIBUTIONS

Startof analysis: plot data


Look for
 Overall pattern
 Shape
 Center

 Spread

 Outliers
 Modality
 Possible groups
SOME GUIDELINES ON GRAPHS

•Provide a main title


•Label the axes

•Try to start both X and Y axis at 0.

•Avoid distorting data

•Abolish chartjunk: don’t use patterns, 3-D effects,


shadows, pictures…
•Avoid excess ink
CONCLUSION

1. Introduce Course Objectives and


Requirements
2. Introduce Statistics and its Applications
3. Describe Types of Data, Ways of
Collecting Data
4. Introduce Graphical Methods for
Describing Data of 1 variable

You might also like