You are on page 1of 10

Lecture 1 Introduction

Basic Concept of Statistics

Learning Objectives
In this chapter you learn to:
● Understand the types of variables used in statistics
● Know the different measurement scales
● Know how to collect data
● Know the different ways to collect a sample
● Understand the types of survey errors

Some Key Terms


♦ VARIABLE
●A characteristic of an item or individual.

♦ DATA
● The set of individual values associated with a variable.

♦ STATISTICS
● The methods that help transform data into useful information for decision makers.

Types of Variables

♦ Categorical (qualitative) variables


● Have values that can only be placed into categories, such as “yes” and “no.”

♦ Numerical (quantitative) variables


● Have values that represent a counted or measured quantity.

1
Lecture 1 Introduction

2
Lecture 1 Introduction

Levels of Measurements
 Nominal
 classifies data into distinct categories in which no ranking is implied
Categorical Variables Categories
Do you have a Facebook Yes, No profile?

Type of investment Growth , Value, Other

 Ordinal
 classifies data into distinct categories in which ranking is implied

 Interval
 Is an ordered scale in which the difference between measurements is a
meaningful quantity but the measurements do not have a true zero point.
 Ratio
 Is an ordered scale in which the difference between the measurements is a
meaningful quantity and the measurements have a true zero point

3
Lecture 1 Introduction

Sources of Data
 Primary Sources:
● the data collector is the one using the data for analysis
● Data from a political survey
● Data collected from an experiment
● Observed data

 Secondary Sources:
● the person performing data analysis is not the data collector
● Analysing census data
● Examining data from print journals or data published on the internet.

Population and Sample


 Population
 Consists of all the items or individuals about which you want to draw a
conclusion.
 The population is the “large group”.
 A numerical value calculated from population known as “Parameter”.
 Sample
 Is the portion of a population selected for analysis.
 The sample is the “small group”
 A numerical value calculated from a sample known as a ”Statistic”.

4
Lecture 1 Introduction

Branches of Statistics

As a subject Statistics can be divided into Descriptive Statistics and Inferential


Statistics.

1. Descriptive Statistics

The branch of statistics which deals with concepts and methods concerned
with summarization and description of the important aspects of numerical
data. Here data is condensed to have some graphs, table and numerical
quantities that provide information about the centre of the data and indicate
the dispersion of observations.

2. Inferential Statistics

The branch of statistics which deals with procedures for making inferences about
the characteristics of the larger group of data or population, from the knowledge
derived from only the part of data i.e. sample. Here Estimation of population
parameter and testing of hypothesis is done which based on probability theory, and
inferences are made on the basis of sample evidence therefore cannot be absolutely
certain

5
Lecture 1 Introduction

THE NATURE AND SOURCES OF DATA FOR


STATISTICAL ANALYSIS
Types of Data
Three types of data may be available for empirical analysis: time series, cross-section, and
pooled (i.e., combination of time series and cross section) data.
1. Time Series Data
Time series data is a collection of observations obtained through repeated measurements over
time. Plot the points on a graph, and one of your axes would always be time.
Time series data is everywhere, since time is a constituent of everything that is observable.
As our world gets increasingly instrumented, sensors and systems are constantly emitting a
relentless stream of time series data. Such data has numerous applications across various
industries. Let’s put this in context through some examples.
Time series data can be useful for:

 Tracking daily, hourly, or weekly weather data


 Tracking changes in application performance
 Medical devices to visualize vitals in real time
 Tracking network logs
 Checking trend of daily COVID 19 cases for each country

2. Cross-sectional data:

Cross-sectional data is a collection of observations (behaviour) for multiple


subjects (entities such as different individuals or groups) at a single point in time.
For example: Max Temperature, Humidity and Wind (all three behaviours) in New York
City, SFO, Boston, Chicago (multiple entities) on 1/1/2015 (single instance)

In cross-sectional studies, there is no natural ordering of the observations (e.g. explaining


people’s wages by reference to their respective education levels, where the individuals’ data
could be entered in any order).

For example: the closing price of a group of 50 stocks at a given moment in time, an
inventory of a given product in stock at a specific stores, and a list of grades obtained by a
class of students on a given exam.

Panel data (longitudinal data):

Panel data is usually called as cross-sectional time series data as it is a combination of the
above- mentioned types (i.e., collection of observations for multiple subjects at multiple
instances).
Panel data or longitudinal data is multi-dimensional data involving measurements over time.
Panel data contains observations of multiple phenomena obtained over multiple time periods

6
Lecture 1 Introduction

for the same firms or individuals. A study that uses panel data is called a longitudinal study or
panel study.
For example: Max Temperature, Humidity and Wind (all three behaviours) in New York
City, SFO, Boston, Chicago (multiple entities) on the first day of every year (multiple
intervals of time).

For Practice
Question 1:
For each of the following variables, determine whether the variables is categorical or
numerical. If the variable is numerical, determine whether it is discrete or continuous. In
addition, determine the level of measurement.

a. Number of televisions per household


b. Size of drink (Small/Regular/Large)
c. Waiting time (in minutes) of a customer
d. Suburb of residence

Question 2:
Suppose the following information is collected from Andrew and Final Chen on their
application for a home mortgage loan at Metro Home Loans. Classify each of the responses by type of
data and level of measurement.

a. Monthly expenses: $2,056


b. Number of dependants being supported by applicant(s)
c. Annual family salary income: $105,000
d. Marital status: Married
Question 3: In inferential statistics, we study
a. the methods to make decisions about population based on sample results
b. how to make decisions about mean, median, or mode
c. how a sample is obtained from a population
d. None of the above
Question 4: In descriptive statistics, we study
a. The description of decision making process
b. The methods for organizing, displaying, and describing data
c. How to describe the probability distribution
d. None of the above
Question 5: When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using:
a. A sample

7
Lecture 1 Introduction

b. A Parameter
c. A Population
d. Both b and c
Question 6: In statistics, a sample means
a. A portion of the sample
b. A portion of the population
c. all the items under investigation
d. none of the above
Question 7: The height of a student is 60 inches. This is an example of ——
——–
a. Qualitative data
b. Categorical data
c. Continuous data
d. Discrete data
Question 8: Data in the Population Census Report is:
a. Ungrouped data
b. Secondary data
c. Primary data
d. Arrayed data
Question 9: Statistic is a numerical quantity, which is calculated from:
a. Population
b. Sample
c. Data
d. Observations
Question 10: Which branch of statistics deals with the techniques that are used to
organize, summarize, and present the data:
a. Advanced Statistics
b. Probability Statistics
c. Inferential Statistics
d. Descriptive Statistics
e. Bayesian Statistics

8
Lecture 1 Introduction

Question 11: You asked five of your classmates about their height. On the basis
of this information, you stated that the average height of all students in your university or
college is 67 inches. This is an example of:
a. Descriptive statistics
b. Inferential Statistics
c. Parameter
d. Population

Question 12: A numerical value used as a summary measure for a sample, such as sample
mean, is known as a

a. population parameter

b. sample parameter

c. sample statistic

d. population mean

Question 12: In a sample of 800 students in a university, 160, or 20%, are Business majors.
Based on the above information, the school's paper reported that "20% of all the students at
the university are Business majors." This report is an example of

a. a sample
b. a population
c. statistical inference
d. descriptive statistics

9
Lecture 1 Introduction

10

You might also like