You are on page 1of 26

STATISTICS INTRODUCTION

&
DEFINITION

Nestor G. Gutiza Jr.


Asst. Prof. 3
Sorsogon State University
INTRODUCTION
 The science of conducting studies to collect, organize,
summarize, analyze, and draw conclusions from data is
called statistics.
 It is used in almost all fields of human endeavor such as
sports, education, health, research, and among others.
Statistical analysis are used to manipulate, summarize,
and investigate data for a useful decision – making
information results.
 Sir Ronald Aylmer Fisher (February 17, 1890 – July 29,
1962), British statistician and geneticist who pioneered
the application of statistical procedures to the design of
scientific experiments.
 He is considered as the Father of Modern Statistics.

 In 1909, he was awarded a scholarship to study


mathematics at University of Cambridge. In 1912, he
graduated from B.A. in Astronomy, and he continue to
study astronomy and physics at the university, and study
the theory of errors which connects him to statistics.
 From 1914 to 1919, he taught high school mathematics
and physics while continuing his research in statistics
and genetics. In 1918, he published an important paper
where he used powerful statistical tools to reconcile
inconsistencies between Charles Darwin’s ideas of
natural selection and rediscovered experiments of
Australian botanist Gregor Mendel
 In 1919, he became statistician for the Rothamsted
Experimental Station and did statistical work associated
with plant – breeding experiments which led to theories
about gene dominance and fitness.
 From 1943 until 1957, he was Balfour Professor of
Genetics at Cambridge.
 He investigated the linkage of genes for different traits
and developed methods of multivariate analysis to deal
with such questions.
 To avoid bias in selection of experiment materials
(inaccurate and misleading), he introduced principle of
randomization.
 In this way, random selection is used to diminish the
effects of variability in experimental materials.
 One of the most important achievement of Fisher is the
concept of analysis of variance or ANOVA.
TYPES OF STATISTICS
 1. Descriptive Statistics - consists of methods for
collection, organization and summarization, and
presentation of data/ information.

Example: construction of graphs, charts, and tables and


the calculation of various descriptive measures such as
averages, measures of variation, and percentiles
 2. Inferential Statistics - consists of methods for drawing and
measuring the reliability of conclusions about a population
based on information obtained from a sample of the
population.
 After collection, organization, summarization, and
presentation of data (descriptive), inferential statistics is used
to determine the findings and draw conclusions, respectively.
 This denotes, that descriptive statistics and inferential
statistics are interrelated. Use descriptive statistics to
organize and summarize the obtained information from
sample before carrying out an inferential statistics.
Descriptive statistics leads us to appropriate inferential
method.
POPULATION AND SAMPLE
 Population - the collection of all individuals or items
under consideration in a statistical study.
 Sample – the part of the population from which
information is obtained. For example, in a certain study
about Statistics University with 6,589 students. The
6,589 students is the population. Hence, if the researcher
randomly selected class A with 44 students, the 44
students is the sample. Sample is the representative of
the population.
BEFORE WE THROUGH THE DISCUSSIONS, LET
USE FIRST DEFINE SOME BASIC OPERATIONAL
TERMS IN STATISTICS:
 Variable – a characteristic or attribute that can assume
different values. Any characteristic, number, or quantity
that can be measured or counted. It is also called data
item. Collected information for variables, describe the
situation.
 Example. Age, sex, business income and expenses, birth,
expenditure, class grades, eye color, and among other
TYPES OF VARIABLES
 1. Numeric Variables/ Quantitative Variables Have values
that describe a measurable quantity as a number, like ‘how
many’ or ‘how much’. These are that quantifiable variables.
Data collected in numeric variable is called quantitative
data.
 a. Continuous Variable Observations can take any value
between a certain set of real numbers. The value given to an
observation for a continuous variable can include values as
small as the instrument of measurement allows.
 Examples: height, time, age, and temperature.
3
 Height can be 1.62m, time can be 3.5hours (3 hours and 30
4 2
minutes), age can be 16 years old (16 years and 9 months),
5
and temperature can be 36 ℃𝑜𝑟 36.40℃
 b. Discrete Variable
Observations can take a value based on a count from
a set of distinct whole values. A discrete variable cannot
take the value of a fraction between one value and the
next closest value.
Examples: number of registered cars, number of business
locations, and number of children in a family, all of
which measured as whole units (i.e. 1, 2, 3 cars)
 2. Categorical Variables/ Qualitative Variables Have
values that describe a 'quality' or 'characteristic' of a data
unit, like 'what type' or 'which category'. Categorical
variables fall into mutually exclusive (in one category or
in another) and exhaustive (include all possible options)
categories. Therefore, categorical variables are
qualitative variables and tend to be represented by a
nonnumeric value. Data collected is called qualitative
data.
 a. Ordinal Variable
 Observations can take a value that can be logically
ordered or ranked. The categories associated with ordinal
variables can be ranked higher or lower than another, but
do not necessarily establish a numeric difference
between each category.
 Examples: academic grades (i.e. A, B, C), clothing size
(i.e. small, medium, large, extra-large) and attitudes (i.e.
strongly agree, agree, disagree, and strongly disagree
 b. Nominal Variable
 Observations can take a value that is not able to be
organized in a logical sequence.
 Examples: sex, business type, eye color, religion and
brand
Variables

Numeric
DATA
 Data – values (measurements or observations) that the
variables can assume. Variables whose values are
determined by chance are called random variables.
 Data Set – collection of data

 Data Value or Datum – each value in the data set

 Quantitative data – data from numeric/ quantitative


variables; quantifiable data Qualitative data – data from
categorical/ qualitative variables; non - numeric
 Discrete data – data from discrete variables; non –
fraction data
 Continuous data – data from continuous variable; data
from the set of real numbers.
 For example, the grades of 5 students in Statistics are 94,
75, 82.5, 74.9, and 89.
 From the example above, the grades of students is the
variable. Under numeric variable, it classified as continuous
variable since it can be represented by decimal or fraction.
 Furthermore, 94, 75, 82.5, 74.9, and 89 is the data set. Each
value is the data value or datum (e.g. 94 is data value or
datum). These data are continuous data since it can be from
a set of real numbers.
 Moreover, variables can also be classified by how they are
categorized besides qualitative and quantitative data –
measurement scales/ level of measurement
LEVEL OF MEASUREMENT
 1. Nominal level of measurement Classifies data into mutually
exclusive (no overlapping) categories in which no order or
ranking can be imposed on the data. Nominal data are
countable.
 Example: gender, zip codes; political party; religion; nationality

 2. Ordinal level of measurement Classifies data into categories


that can be ranked; however, precise differences between the
ranks do not exist. Contain more information. Consists of
distinct categories in which order is implied. Values in one
category are larger or smaller than values in other categories
(e.g. rating-excelent, good, fair, poor)
 Example: evaluation (superior, average, poor); ranking (first,
second, etc.); letter grades (A, B, C, D, E, F)
 3. Interval level of measurement Ranks data, and precise
differences between units of measure do exist; however, there is
no meaningful zero. Set of numerical measurements in which
the distance between numbers is of a known, constant size.
 Example: IQ level; temperature

 There is a meaningful difference of 1 point between an IQ of


109 and an IQ of 110. Temperature is another example of
interval measurement, since there is a meaningful difference of
1°F between each unit, such as 72 and 73°F.
 One property is lacking in the interval scale: There is no true
zero. For example, IQ tests do not measure people who have no
intelligence. For temperature, 0°F does not mean no heat at all.
 4. Ratio level of measurement
 Possesses all the characteristics of interval measurement, and there
exists a true zero or non - arbitrary zero point. In addition, true
ratios exist when the same variable is measured on two different
members of the population. Consists of numerical measurements
where the distance between numbers is of a known, constant size
 Example: height; weight; area; number of phone calls

 There exists a true zero or non - arbitrary zero point, zero weight,
height, area, or phone calls is meaningful, it could implies that the
thing does not exist.
 For example, if one person can lift 200 pounds and another can lift
100 pounds, then the ratio between them is 2 to 1. Put another way,
the first person can lift twice as much as the second person.
 There is not complete agreement among statisticians
about the classification of data into one of the four
categories. For example, some researchers classify IQ
data as ratio data rather than interval. Also, data can be
altered so that they fit into a different category. For
instance, if the incomes of all professors of a college are
classified into the three categories of low, average, and
high, then a ratio variable becomes an ordinal variable.
DATA COLLECTION AND SAMPLING
TECHNIQUES
 Developed to mathematically determine the most
effective way to acquire a sample that would accurately
reflect the population of the study.
 The most common mathematical formula to determine
the number of sample in reference to population is the
Slovin’s Formula which is introduced by Slovin in 1960.
To this day, it is still unknown who really Solvin is,
many names associated either Mark Slovin, Michael
Slovin, or Kulkol Slovin.
SLOVIN’S FORMULA

Use Slovin’s formula if you have no idea about the population’s behavior. Slovin’s
formula determines sample in proportion to the population. Slovin’s formula is
applicable only when estimating a population proportion and when the confidence
coefficient is 95%. There are other sampling formulae that could be used to determine
samples in relation to the characteristics of the variables
 In most educational and scientific researches, 0.05
margin of error (level of significance is used most of the
times.
 Margin of error tells how many times percentage points
your results will differ from the real population. For
example, 0.05 (5%) level of significance which implies
0.95 (95%) confidence level to the real population value.
 Example: Assuming a certain is to be conducted to a
certain community with 6,518 residents. Determine the
number of respondents of the study with 5% level of
significance using Slovin’s formula.
End of Part 1

You might also like