You are on page 1of 33

7/7/2021

Introduction to Statistics

By
Prof. Vishal Singh Patyal

Learning Objectives

• What is Statistics
• Why Statistics
• Basic vocabulary used in Statistics
• Sources of data and its types
• Types of Variables
• Level of Measurement

1
7/7/2021

What is Statistics?

The science of collecting, analysis, interpreting data and


presentation of numerical data

“Statistics is a way to get information from data”

Statistics

Data Information
Data: Facts, especially Information: Knowledge
numerical facts, collected communicated
together for reference or concerning some
information. particular fact.
Statistics is a tool for creating new understanding from a set of numbers.

What is statistics?

• The word “statistics” is used in 3 main ways:


– Common meaning : factual information involving
numbers. A better word for this is data
– Precise meaning: quantities which have been
derived from sample data, e.g. the mean (or
average) of a data set
– Common meaning: an academic subject which
involves reasoning about statistical quantities

2
7/7/2021

Why Study Statistics?

• There are at least three reasons for studying


statistics
– Data are everywhere
– Statistical techniques are used to make many
decisions that affect our lives
– No matter what your career, you will make
professional decisions that involve data.
– An understanding of statistical methods will help
you make these decisions more effectively.

Statistics in Business
• Accounting — auditing and cost estimation
• Economics — local, regional, national, and
international economic performance
• Finance — investments and portfolio management
• Management — human resources, compensation, and
quality management
• Management Information Systems — performance of
systems which gather, summarize, and disseminate
information to various managerial levels
• Marketing — market analysis and consumer research
• International Business — market and demographic
analysis

3
7/7/2021

Basic vocabulary used in Statistics

Population
• A population consists of all the items
or objects about which you want to
draw a conclusion.
• The objects can be people, animals,
plants, etc.
• Population size is usually very large
(human beings) but can be very small
also (panda).
• A study that involves a population is
called census.
• The population size denoted by N

4
7/7/2021

Object Characteristics & Data


We are usually interested in certain
characteristics of the object

Data on these characteristics can be


obtained by
• Measuring (weight)
• Counting (moles)
• Asking (marital status)
• Observing (eye colour)
• Computing (BMI)

Sample

• A sample is a subset of objects


drawn from population of
interest
• The sample size is denoted by n.
• n is usually smaller than
population size N
• Important to determine sample
size n before drawing sample
from population
• E.g. a sample of 765 voters exit
polled on election day

10

5
7/7/2021

Population

11

Population and Census Data

Identifier Color MPG

RD1 Red 12
RD2 Red 10
RD3 Red 13
RD4 Red 10
RD5 Red 13
BL1 Blue 27
BL2 Blue 24
GR1 Green 35
GR2 Green 35
GY1 Gray 15
GY2 Gray 18
GY3 Gray 17

12

6
7/7/2021

Sample and Sample Data

Identifier Color MPG

RD2 Red 10

RD5 Red 13

GR1 Green 35

GY2 Gray 18

13

Parameter vs. Statistic


• Parameter — descriptive measure of the population
– Usually represented by Greek letters
 denotes population parameter
 2 denotes population variance
 denotes population standard deviation
• Statistic — descriptive measure of a sample
– Usually represented by Roman letters
x denotes sample mean
s 2 denotes sample variance
s denotes sample standard deviation

14

7
7/7/2021

Process of Inferential Statistics

4. Use x
to estimate 
1. Population 3. Sample
 x
(parameter ) (statistic )

2. Select a
random sample

15

Statistics in Business

• Inferences about parameters made under conditions


of uncertainty (which are always present in statistics)
– Uncertainty can be caused by
• Randomness in selection of a sample
• lack of knowledge about the source of the
inferences
• change in conditions not accounted for
• Probability is used in statistics
– To estimate the level of confidence in a confidence
interval
– To calculate the p-value in hypothesis testing

16

8
7/7/2021

Basic Vocabulary of Statistics

Variable
• A variable is some characteristic of a population or sample.
E.g. student grades. Typically denoted with a capital letter: A,
B, C…
• The values of the variable are the range of possible values for
a variable.
E.g. student marks (0..100)
Data
• Data are the observed values of a variable.
• Data are the different values associated with a variable.
E.g. student marks: {67, 74, 71, 83, 93, 55, 48}

17

Example
IIMV Institute dean is interested in learning about the average age of
faculty. Identify the basic terms in this situation.

The population is the age of all (30) faculty members at the Institute.
A sample is any subset of that population. For example, we might
select 5 faculty members and determine their age.
The variable is the “age” of each faculty member.
The data would be the set of values in the sample.
The parameter of interest is the “average” age of all faculty at the
Institute.
The statistic is the “average” age for all faculty in the sample.

18

9
7/7/2021

Statistics

19

Descriptive Statistics

• Descriptive Statistics is that branch of


Statistics that summarizes, presents and
analyzes and reach conclusion about same
group.
• Descriptive statistic includes methods of
– organizing
– summarizing
– analyzing
– presenting data in an informative way

20

10
7/7/2021

Descriptive Statistics

 Collect data
 ex. Survey
 Present data
 ex. Tables and graphs
 Characterize data
 ex. Sample mean =  X i
n
 Collect
 Organize
 Summarize
 Display
 Analyze
21

Inferential Statistics

• Another facet of statistics is inferential


statistics-also called statistical inference
and inductive statistics.
• Statistical inference is that branch of
Statistics that deals with drawing valid
inferences about the population
parameters on the basis of sample data.

22

11
7/7/2021

Inferential Statistics

• Estimation
‒ ex. Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
‒ ex. Test the claim that the  Predict and forecast values
population mean weight is 120 of population parameters
pounds  Test hypotheses about
values of population
parameters
 Make decisions

Drawing conclusions and/or making decisions concerning a population based on


sample results.

23

Sources of Data

Primary
Sources of
Data
Secondary

24

12
7/7/2021

Sources of Data
• Primary Sources:
The data collector is the one using the data for analysis
 Data from a political survey
 Data collected from an experiment
 Observed data
• Secondary Sources
The person performing data analysis is not the data collector
 Analyzing census data
 Examining data from print journals or data published on
the internet.

25

Primary Data
Merits Demerits

 It is original in nature  It is expensive

 It is more reliable, authentic  It is time consuming


and accurate
 Sometime it is difficult to
 It is generally free from bias
approach the exact source
 It can be used with greater
 Collection of primary data usually
confidence
involves creating new definitions

and measuring instruments

26

13
7/7/2021

Secondary Data
Merits Demerits
 They may not have been collected
 It is readily available
the data through proper procedure
 It is much less expensive as
compared to primary data  They may have been influenced by
biased investigation
 It is less time consuming as
 They may be out of date and not
compared to primary data
suitable for present period

 They may not satisfy a reasonable


standard of accuracy

27

Types of Variables

Data

Categorical Numerical

Examples:
 Marital Status
Discrete Continuous
 Political Party
 Eye Color
Examples: Examples:
(Defined categories)
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured characteristics)

28

14
7/7/2021

Types of Variables
Categorical
• Qualitative variables have values that can only be placed into
categories, such as “yes” and “no.”
• A variable that categorizes or describes an element of a
population.
Note: Arithmetic operations, such as addition and averaging, are not
meaningful for data resulting from a qualitative variable
Numerical
• Quantitative variables have values that represent quantities.
• A variable that quantifies an element of a population.
Note: Arithmetic operations such as addition and averaging, are
meaningful for data resulting from a quantitative variable.

29

Example
Identify each of the following examples as attribute (Categorical) or
numerical (Numerical) variables.

 The amount of CNG pumped by the next 10 customers at the local


hp PUMP . (Numerical)
 The amount of radon in the basement of each of 25 homes in a
new development. (Numerical)
 The color of the baseball cap worn by each of 20 students.
(Categorical)
 The length of time to complete a mathematics homework
assignment. (Numerical)
 The state in which each truck is registered when stopped and
inspected at a weigh station. (Categorical)

30

15
7/7/2021

Question?
Identify each of the following as examples of Categorical or
Numerical variables:
The temperature in Barrow, Alaska at 12:00 pm on any
given day.
The make of automobile driven by each faculty member.
Whether or not a 6 volt lantern battery is defective.
Models of cell phones
The length of time billed for a long distance telephone call.
The brand of cereal children eat for breakfast.
The type of book taken out of the library by an adult.

31

Level of Measurement

Ratio
Interval

Ordinal

Nominal NOIR

32

16
7/7/2021

Nominal scale
• A nominal scale classifies data into distinct categories in
which no ranking is implied.
• There must be distinct classes but these classes have no
quantitative properties. Therefore, no comparison can be
made in terms of one category being higher than the
other.
– Example : there are two classes for the variable gender -
males and females. There are no quantitative properties for
this variable or these classes and, therefore, gender is a nominal
variable.
– Another example is religion – Hindus, Catholic, Protestant,
Muslim, etc.
• Sometimes numbers are used to designate category
membership

33

Example

34

17
7/7/2021

Ordinal scale
• An ordinal scale classifies data into distinct categories in
which ranking is implied
• There are distinct classes but these classes have a natural
ordering or ranking. The differences can be ordered on
the basis of magnitude.
– Example - a gold medal reflects superior performance to a
silver or bronze medal in the Olympics. You can’t say a gold and
a bronze medal average out to a silver medal, though.
– Preference scales are typically ordinal – how much do you like
this cereal? Like it a lot, somewhat like it, neutral, somewhat
dislike it, dislike it a lot.
• Does not assume that the intervals between numbers are
equal

35

Example

36

18
7/7/2021

Example

finishing place in a race


(first place, second place)

1st place 2nd place 3rd place 4th place

1 hour 2 hours 3 hours 4 hours 5 hours 6 hours 7 hours 8 hours

37

Interval scale
• An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true
zero point, that is It can go below zero
• It is possible to compare differences in magnitude, but
importantly the zero point does not have a natural
meaning.
• It captures the properties of nominal and ordinal
scales - used by most psychological tests.
• Example
– Percentage change in employment
– Percentage return on a stock
– Dollar change in stock price

38

19
7/7/2021

Example
• The difference between 1 and 2 years of age is the same
amount as the difference between 21 and 22 years of
age, or 50 and 51, or 65 and 66.
• We can see that the same difference exists between 10o
C and 20o C vs 25o C and 35o C. But we can not say that
20o C is twice as hot as a temperature of 10o C
• Celsius temperature is an interval variable. It is
meaningful to say that 25 degrees Celsius is 3 degrees
hotter than 22 degrees Celsius, and that 17 degrees
Celsius is the same amount hotter (3 degrees) than 14
degrees Celsius.
• Notice, however, that 0 degrees Celsius does not have a
natural meaning. That is, 0 degrees Celsius does not
mean the absence of heat!

39

40

20
7/7/2021

Ratio Scale
• Highest level of measurement
– Relative magnitude of numbers is meaningful
– Differences between numbers are comparable
– Location of origin, zero, is absolute (natural)
– Vertical intercept of unit of measure transform
function is zero
• Example
• Measurement like Height, Weight, and Volume
• Monetary Variables like Profit and Loss, Revenues,
Expenses
• Financial ratios like: P/E Ratio, Inventory Turnover,
and Quick Ratio.

41

42

21
7/7/2021

Interval Vs Ratio
• In an interval scale, you can take difference of two values.
You may not be able to take ratios of two values.
• Example: temperature in Celsius.
– You can say that if temperature in Delhi is 40 deg Celsius and
that in Shimla is 20 deg Celsius, then Delhi is 20 deg Celsius
hotter than Shimla (taking difference).
– But you cannot say Delhi is twice as hot as Shimla (not
allowed to take ratio).
• In a ratio scale, you can take a ratio of two values.
• Example
– 40 kg is twice as heavy as 20 kg (taking ratios).
– Also, “0” on ratio scale means the absence of that physical
quantity. “0” on interval scale doesn't mean the same.
– 0 kg means the absence of weight.
– 0 deg Celsius doesn't mean absence of heat.

43

The Hierarchy of Levels

Nominal

44

22
7/7/2021

The Hierarchy of Levels

Nominal Attributes are only named; weakest

45

The Hierarchy of Levels

Ordinal

Nominal Attributes are only named; weakest

46

23
7/7/2021

The Hierarchy of Levels

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

47

The Hierarchy of Levels

Interval
Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

48

24
7/7/2021

The Hierarchy of Levels

Interval Distance is meaningful

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

49

The Hierarchy of Levels

Ratio
Interval Distance is meaningful

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

50

25
7/7/2021

The Hierarchy of Levels

Ratio Absolute zero

Interval Distance is meaningful

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

51

Example
• Many changes continue to occur in the
healthcare industry.
• Because of increased competition for patients
among providers and the need to determine how
providers can better serve their clientele,
hospital administrators sometimes administer a
quality satisfaction survey to their patients after
the patient is released.
• The following types of questions are sometimes
asked on such a survey.
• These questions will result in what level of data
measurement?

52

26
7/7/2021

Sample Questions
• How long ago were you released from the hospital?
• Which type of unit were you in for most of your stay?
– Coronary care
– Intensive care
– Maternity care
– Medical unit
– Pediatric /children’s unit
– Surgical unit
• In choosing a hospital, how important was the
hospital’s location? (circle one)
Very Important Somewhat Important Not Very
Important Not at All Important

53

• How serious was your condition when you


were first admitted to the hospital?
__Critical __Serious __Moderate __Minor
• Rate the skill of your doctor:
__Excellent __Very Good __Good __Fair __Poor

54

27
7/7/2021

Level of Measurement :
Characteristics

55

Data Level, Operations, and


Statistical Methods
Data Level Meaningful Operations

Nominal Classifying and Counting

Ordinal All of the above plus Ranking

Interval All of the above plus Addition,


Subtraction, Multiplication, and
Division (including means,
standard deviations, etc.)

Ratio All of the above

56

28
7/7/2021

Level of Measurement:
Statistical Tests

57

Exercise
• The Rathburn Manufacturing Company makes electric wiring,
which it sells to contractors in the construction industry.
Approximately 900 electric contractors purchase wire from
Rathburn annually.
• Rathburn’s director of marketing wants to determine electric
contractors’ satisfaction with Rathburn’s wire.
• He developed a questionnaire that yields a satisfaction score
between 10 and 50 for participant responses.
• A random sample of 35 of the 900 contractors is asked to
complete a satisfaction survey. The satisfaction scores for the
35 participants are averaged to produce a mean satisfaction
score.

58

29
7/7/2021

Questions
• What is the population for this study?
• What is the sample for this study?
• What is the statistic for this study?
• What would be a parameter for this
study?

59

Example
Identify each of the following as examples of (1) nominal, (2)
ordinal, (3) discrete, or (4) continuous variables:

 The length of time until a pain reliever begins to


work.
The brand of refrigerator in a home.
Number of Telephones per household.

60

30
7/7/2021

Class Exercise
Q 1: Determine whether the variable is categorical or
numerical If numerical, determine whether the
variable is discrete or continuous .Determine the
level of measurement

a. Amount of time spent shopping in the


bookstore
b. Number of textbooks purchased
c. Academic major (specialization: Operations, HR,
OB, IT)
d. Gender
61

Class Exercise
Q 2: Determine whether the variable is categorical or
numerical If numerical, determine whether the
variable is discrete or continuous .Determine the
level of measurement

a. Name of Internet service provider


b. Time in hours spent surfing the Internet per week
c. Number of emails received in a week
d. Number of online purchases made in a month

62

31
7/7/2021

Exercise
Q 3: Suppose the following information is collected
from Mr. Ajay on his application for a home loan at
HDFC Bank. Classify each of the responses by type of
data and measurement scale
a. Monthly payments: $1,927.22
b. Number of jobs in past 10 years
c. Annual family income: $76,000.30
d. Marital status: Married

63

Class Exercise
Q 4 : A manufacturer of dog food was planning to
survey household in India to determine purchasing
habit of dog owners. Among the variables to be
collected are
The primary place of purchase of dog food?
Whether dry or moist food can be purchased ?
Number of dogs living in the household?
Whether the dog is pedigreed?

64

32
7/7/2021

Thank You

65

33

You might also like