You are on page 1of 81

Introduction & Basic

Concepts in Statistics
Statistics is used in business and economics. It plays
an important role in the exploration of new markets
for a product, forecasting of business trends, control
and maintenance of high-quality products, improvement
of employer-employee relationship and analysis of data
concerning insurance, investment, sales, employment,
transportation, communications, auditing and
accounting procedures.
STATISTICS is the branch of mathematics that deals
with the theory and method of collecting, organizing,
presenting, analyzing and interpreting data.Two

Two Main Divisions/Phases of Statistics


1. DESCRIPTIVE STATISTICS refers to the summary statistic
that quantitatively describes or summarizes features from a collection
of data under investigation. The goal is to describe. Numerical measures
are used to tell about features of a set of data.
Examples:

• The average, or measure of the center of a data set, consisting of the mean, median,
mode, or midrange

• The spread of a data set, which can be measured with the range or standard deviation

• Overall descriptions of data such as the five number summary

• Measurements such as skewness and kurtosis

• The exploration of relationships and correlation between paired data

• The presentation of statistical results in graphical form


2. INFERENTIAL STATISTICS- statistical tools that are used to
examine the relationships between variables within a sample and then
make generalizations or predictions about how those variables will
relate to a larger population.
• Example:

• A confidence interval gives a range of values for an unknown parameter of the population by
measuring a statistical sample. This is expressed in terms of an interval and the degree of
confidence that the parameter is within the interval.

• Tests of significance or hypothesis testing where scientists make a claim about the
population by analyzing a statistical sample. By design, there is some uncertainty in this
process. This can be expressed in terms of a level of significance.
Two Branches of Statistics
1. Statistical Theory – is concerned with the formulation of
theories, principles, and formulas which are used as bases
in the solution of problems related to Statistics.
2. Statistical Methods – is concerned with the application of
the theories, principles and formulas in the solution of
everyday problems.
OTHER STATISTICAL TERMS:
• POPULATION – a set of data consisting of all conceivable possible observations of a certain
phenomenon. It refers to the totality of the observations. Population is denoted by capital N.

• SAMPLE – a finite number of items selected from a population possessing identical


characteristics with those of the population from which it was taken. Sample is denoted by
small letter n

• PARAMETERS – are characteristics/measures computed from the population

• STATISTIC/S – are characteristics/measures computed from the sample


• VARIABLE – refers to a fundamental quantity that changes in value
from one observation to another within a given domain and under a
given set of conditions. Variables may be represented by the letters
X, Y, etc.
• DISCRETE VARIABLE - is a variable whose value is obtained by
counting.
• CONTINUOUS VARIABLE- is a variable whose value is obtained by
measuring.
• CONSTANT – refers to fundamental quantities that do not change in
value.
Four Levels of Data Measurement
Nominal –also called the categorical variable scale, is defined as a
scale used for labeling variables into distinct classifications and
doesn’t involve a quantitative value or order. This scale is the
simplest of the four variable measurement scales. Calculations done
on these variables will be futile as there is no numerical value of the
options. (ex. Sex, gender, place of residence, political affiliation)
Ordinal –a variable measurement scale used to simply depict the
order of variables and not the difference between each of the
variables. These scales are generally used to depict non-mathematical
ideas such as frequency, satisfaction, happiness, a degree of pain,
etc.
• Ordinal Scale maintains descriptional qualities along with an intrinsic order but is
void of an origin of scale and thus, the distance between variables can’t be
calculated. Descriptional qualities indicate tagging properties similar to the nominal
scale, in addition to which, the ordinal scale also has a relative position of variables.
Origin of this scale is absent due to which there is no fixed start or “true zero”.
• Examples:

• High school class ranking: 1st, 9th, 87th…

• Socioeconomic status: poor, middle class, rich.

• The Likert Scale: strongly disagree, disagree, neutral, agree, strongly agree.

• Level of Agreement: yes, maybe, no.

• Time of Day: dawn, morning, noon, afternoon, evening, night.

• Political Orientation: left, center, right.


• Interval Scale is defined as a numerical scale where the order of the variables is
known as well as the difference between these variables. Variables that have
familiar, constant, and computable differences are classified using the Interval
scale. It is easy to remember the primary role of this scale too, ‘Interval’ indicates
‘distance between two entities’, which is what Interval scale helps in achieving.
• These scales are effective as they open doors for the statistical analysis of
provided data. Mean, median, or mode can be used to calculate the central tendency
in this scale. The only drawback of this scale is that there no pre-decided starting
point or a true zero value.

• Interval scale contains all the properties of the ordinal scale, in addition to which, it
offers a calculation of the difference between variables. The main characteristic of
this scale is the equidistant difference between objects.
Interval Scale Examples

• There are situations where attitude scales are considered to be interval scales.

• Apart from the temperature scale, time is also a very common example of an interval scale as the values
are already established, constant, and measurable.

• Calendar years and time also fall under this category of measurement scales.

• Likert scale, Net Promoter Score, Semantic Differential Scale, Bipolar Matrix Table, etc.
are the most-used interval scale examples.

• Celsius Temperature.

• Fahrenheit Temperature.

• IQ (intelligence scale).

• SAT scores.

• Time on a clock with hands.


• Ratio Scale: 4th Level of Measurement

• is defined as a variable measurement scale that not only produces the order of
variables but also makes the difference between variables known along with
information on the value of true zero. It is calculated by assuming that the variables
have an option for zero, the difference between the two variables is the same and
there is a specific order between the options.

• With the option of true zero, varied inferential, and descriptive analysis techniques
can be applied to the variables. In addition to the fact that the ratio scale does
everything that a nominal, ordinal, and interval scale can do, it can also establish the
value of absolute zero. The best examples of ratio scales are weight and height. In
market research, a ratio scale is used to calculate market share, annual sales, the
price of an upcoming product, the number of consumers, etc.
• Examples of Ratio scale

• Age

• Weight

• Height

• Sales Figures

• Ruler measurements.

• Income earned in a week


STEPS IN A STATISTICAL INQUIRY OR INVESTIGATION

start with a problem

1. Collection of data

2. Presentation of data

3. Analysis of data

4. Interpretation of data
Data Collection and Data Presentation
What are DATA?

• Data are plain facts, usually raw numbers, words, measurements,


observations or just description of things. Think of a spreadsheet full
of numbers with no meaningful description. In order for these
numbers to become information, they must be interpreted to have
meaning.
TWO TYPES OF DATA

1. QUALITATIVE DATA is descriptive in nature ex., color, shapes

2. QUANTITATIVE is numerical information ex. weight, height


DATA COLLECTION

• Data collection is concerned with the accurate gathering of data;


although methods may differ depending on the field, the emphasis on
ensuring accuracy. The primary goal of any data collection is to
capture quality data or evidence that easily translates to rich data
analysis that may lead to credible and conclusive answers to questions
that have been posed.
METHODS OF DATA COLLECTION

1. THE INTERVIEW or DIRECT METHOD


The researcher or interviewer gets the needed data
from the respondent or interviewee verbally and directly
face-to-face contact.
2. THE QUESTIONNAIRE or INDIRECT METHOD
The questionnaire is a tool for data gathering and
research that consists of a set of questions in a
different form of question type that is used to collect
information from the respondents for the purpose of
either survey or statistical analysis study.
3. REGISTRATION METHOD
This method is used by the government such as the records of births at
the Philippine Statistics Authority (PSA), registration record at the COMELEC

4. OBSERVATION
This method is a way of collecting data through observing. The
observer gains firsthand knowledge by being in and around the social setting
that is being investigated.
5. EXPERIMENTATION
An experiment is a procedure carried out to support, refute, or validate
a hypothesis. An experiment is a method that most clearly shows cause-and-
effect because it isolates and manipulates a single variable, in order to clearly
show its effect.
DATA PRESENTATION
Once data has been collected, it has to be classified and organized in such a way that it
becomes easily readable and interpretable, that is, converted to information.

TYPES OF DATA PRESENTATION


1. TEXTUAL PRESENTATION
This type of presentation combines text and figures in a statistical report.
Example: news item in the newspaper
2. TABULAR PRESENTATION
This type of presentation uses tables consisting of vertical columns and
horizontal rows with headings describing these rows and columns. The data are
presented in more brief and orderly manner.
Example: frequency table
3. GRAPHICAL PRESENTATION
It is a most effective means of presenting statistical data because important
relationships are brought out more clearly in graphs.
DIFFERENT TYPES OF GRAPHS COMMONLY USED IN
DATA PRESENTATION

1. BAR GRAPH
A bar chart or bar graph is a chart or graph that presents categorical
data with rectangular bars with heights or lengths proportional to the values that they
represent. The bars can be plotted vertically or horizontally.
LINE GRAPH
A line graph is a graphical display of information that changes continuously over time.
A line graph may also be referred to as a line chart. Within a line graph, there are
points connecting the data to show a continuous change. The lines in a line graph can
descend and ascend based on the data. We can use a line graph to compare different
events, situations, and information.
PIE GRAPH
A pie chart is a circular chart divided into wedge-like sectors, illustrating
proportion. Each wedge represents a proportionate part of the whole, and the total
value of the pie is always 100 percent.
Pie charts can make the size of portions easy to understand at a glance.
They're widely used in business presentations and education to show the proportions
among a large variety of categories including expenses, segments of a population, or
answers to a survey.
SCATTER DIAGRAM
A scatter diagram also called a scatterplot, is a type of plot or mathematical
diagram using Cartesian coordinates to display values for typically two variables for a
set of data. If the points are coded (color/shape/size), one additional variable can be
displayed. The data are displayed as a collection of points, each having the value of one
variable determining the position on the horizontal axis and the value of the other
variable determining the position on the vertical axis.
5. PICTOGRAPH/PICTOGRAM
A pictograph is a chart or graph, which uses pictures to represent data. A pictograph
is one of the simplest forms of data visualization.
Two types of Sampling
• Probability sampling
• Simple random
• Systematic
• Stratified
• Cluster
• Non-probability sampling
• Convenience/Accidental
• Judgmental/Purposive
• Quota
• Snowball
Probability vs non-probability sampling
1. Probability or Random Sampling
Provides equal chances to every single element of the population to be
included in the sampling.

2. Non-Probability Sampling
The samples are selected in a process that does not give all the
individuals in the population equal chances of being selected.
Samples are selected on the basis of their accessibility or by the
purposive personal judgment of the researcher.
Probability-based Sampling

Simple Random Sampling


 Lottery Method
 Fish Bowl Method
 Table of Random Numbers
Probability-based Sampling
Systematic Sampling
Step 1. Identify the population (N)
Step 2. Identify the number of sample (n) to be drawn from the
population
Step 3. Divide N by n to find nth interval

Example
Population is 1,000. Desired sample size is 100. Sampling interval is 10
Get a random start from 1 to 10 in the list as first sample and every 10th
in the list
Probability-based Sampling
Stratified Sampling
Used to ensure that different groups in the population are adequately represented in the sample
Step 1. Identify the population and divide the population into different groups or strata according to criteria.
Step 2. Decide on the sampling size or actual percentage of the population to be considered as sample.
Step 3. Get a proportion of sample from each group
Step 4. Select the respondents by random sampling

Example : Population = 2000 Desired Sample Size = 10%


Proportion of sample per stratum = 10%
500 students x .10 = 50
600 businessman x .10 = 60
400 teachers x .10 = 40
500 farmers x .10 = 50
Total sample = 200
Select the 200 by random sampling.
Probability-based Sampling

Cluster Sampling
Often called geographic sampling
Used in large scale surveys
The population is divided into multiple groups called clusters .
The clusters are selected with simple random or systematic
sampling technique for data collection and data analysis.
Example: the Population includes elementary schools in the Province.
The province is first divided into Districts which are treated as clusters
and are randomly selected. From the districts, the schools can be picked
out at random and then classes and then students are selected at random
Non-Probability Sampling

1. Accidental or Convenience Sampling


Researcher selects subjects that are more readily accessible or
available.

2. Purposive Sampling
Subjects are selected based on the needs of the study.
Non-Probability-based Sampling
Quota Sampling
Researcher takes a sample that is in proportion to some characteristic or
trait of the population
The population is divided into groups or strata (the basis may be age,
gender, education level, race, religion etc.
Samples are taken from each group to meet a quota.
Care is taken to maintain the correct proportions representative of the
population.

Example :
The population consists of 60% female and 40% male.
The desired sample size is 200.
Therefore, the sample should consist of ____ females and ____ males.
Non-Probability-based Sampling
A study on science teaching is to be conducted in high schools of a region.
There are 4,641 teachers grouped according to area of specialization.
There are 2,243 biology teachers, 1,406 chemistry teachers and 992 physics
teachers.
The desired sample size is 300.
Select the sample according to the Quota Sampling technique.
Non-Probability-based Sampling
4. Snowball Sampling
This type of sampling starts with known sources of information, who or
which will in turn give other sources of information . As this goes on,
data accumulates.

This is used to find socially devalued urban populations such as drug


addicts, alcoholics, child abusers and criminals because they are usually
hidden from outsiders.
SUMMATION NOTATIONS
FREQUENCY
DISTRIBUTION
Frequency Distribution
When the researcher gathers all the needed data, the next
task is to organize and present them with the use of
appropriate tables and graphs. Frequency distribution is one
system used to facilitate the description of important
features of the data.
Frequency Distribution Table (FDT) is a tabular arrangement
of data showing its classification or grouping according to
magnitude or size.
Example: The scores of 70 students in a quiz could be organized into the
following frequency table.
Scores Frequency (f) Class Mark (Xi)
10-19 12 14.5
20-29 14 24.5
30-39 25 34.5
40-49 11 44.5
50-59 8 54.5

• COMPONENTS OF A FREQUENCY DISTRIBUTION


CLASS INTERVALS or CLASS LIMITS ( Ci ) – the end numbers. The smallest value of the
class is called the lower class limit and the largest value of the class is called the upper class
limit. Class limits are also called inclusive classes.
• Example: Let us take the class interval 10 – 19. The smallest value 10 is the lower class
limit and the largest value 19 is the upper class limit.
CLASS FREQUENCY ( F ) – means the number of observations belonging to a class
interval.
In the sample FDT, the class frequencies are found in the second column, the
entries are based on the given data, for the interval 10-19, the corresponding
frequency is 12.

CLASS MARK ( Xi) – also known as the midpoint or the middle value of a class
interval. It is the average of the lower and upper limits of each class.

10+19
Let us take the class interval 10 – 19. The class mark is = 14.5
2
CLASS BOUNDARIES ( CB) - the true values which describe the actual class limits
of a class. They can be obtained by simply adding 0.5 to the upper limit and
subtracting 0.5 from the lower limit of each class.
In the sample FDT for the class interval 10 – 19, the lower class boundary is
9.5 and upper class boundary is 19.5.

CLASS/INTERVAL SIZE ( i )– the width of each class interval. The difference


between two successive class limits or the difference between two successive class
marks or two successive class boundaries is called the class or interval size.

Let us take the following successive class limits, from the sample FDT, the difference
between 10 and 20 is 10; 20 and 30 is 10, etc.
RELATIVE FREQUENCY (RF) – the ratio of the number of observations to the total
number of observations or the frequency expressed in percent
Scores Frequency (F) Relative Class Mark Class Less than
Frequency Boundaries Cumulative
(Xi)
Frequency
(RF) (CB)

10-19 12 17 14.5 9.5-19.5 12

20-29 14 20 24.5 19.5-29.5 26

30-39 25 36 34.5 29.5-39.5 51

40-49 11 16 44.5 39.5-49.5 62

50-59 8 11 54.5 49.5-59.5 70

70 100
Constructing the Frequency Distribution
Given : a sample of 40 car battery lives, let us construct the FDT for the given data

2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6

3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7

2.5 4.3 3.4 3.6 2,9 3.3 3.9 3.1

3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4

4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5


Steps in constructing Frequency DistributionTable
1. The range refers to the difference between the highest and the lowest scores.
Range = Highest score - Lowest score
R = 4.7 – 1.6
R = 3.1

2. Determine the tentative number of classes k using the formula below. The number of class
intervals, should not be too few and not to many
k= √n
k = √40 = 6.32
3. Find the interval/class size
i=R/k
i=3.1/6.32 = 0.49 ≈ 0.5
4. Construct the first class interval start from 1.5 with .5 as the class size.
4. Tally the frequencies for each interval and sum them.
5. Compute the relative frequencies (RF)
RF = f/n (100) = 2/40 (100) = 5, etc
6. Find the class marks (Xi) of the class intervals.

Xi = 1.5 + 1.9/2 = 1.7


Class Tally Frequency Class
Interval (F) Mark FiXi Fi lXi-Xl 𝟐𝟐
𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝟐𝟐
𝑿𝑿 lXi-𝑿𝑿l 𝑿𝑿𝑿𝑿
(Xi)

1.5 – 1.9 II 2 1.7 3.41 3.4 1.71 3.42 2.89 5.78


2.0 – 2.4 I 1 2.2 2.2 1.21 1.21 4.84 4.84
2.5 – 2.9 IIII 4 2.7 10.8 0.71 2.84 7.29 29.16
3.0 – 3.4 IIIII 15 3.2 48 0.21 3.15 10.24 153.6
IIIII
IIIII
3.5 – 3.9 IIIII 10 3.7 37 0.29 2.9 13.69 136.9
IIIII
4.0 – 4.4 IIIII 5 4.2 21 0.79 3.95 17.64 88.2
4.5 – 4.9 III 3 4.7 14.1 1.29 3.87 22.09 66.27
40 136.5 6.21 21.34 484.75
7. Compute for the class boundaries CB
CB = UCLi-1 + LCLi/2
CB= 1.4+1.5/2=1.45
8. Less than cumulative frequency (<CF)
9. Greater than cumulative frequency (>CF)
Frequency polygon – graph showing the relationship between the
frequencies and the class marks
Histogram – shows the relationship between the frequencies and the
class boundaries
<Ogive – shows the relationship between the <CF and the upper class
boundaries
>Ogive – shows the relationship between the >CF and the lower class
boundaries
MEAN, MEDIAN,MODE AND
OTHER MEASURES OF POSITION
PROPERTIES OF THE MEAN
• The mean is always a unique value in any set of data
• The mean is associated with the interval/ratio data
• The mean is strongly influenced by the extreme values in a
set of data
• The mean is the most reliable measure of central tendency
PROPERTIES OF THE MEDIAN
• Like the mean, the median is also a unique value in any set of data
• The median is associated with ordinal data
• The median value is not affected by the extreme values
• The median is only a function of the middle values (even or odd) or
the average of the two middle values (when n is even) when the
data are arranged from the highest value to the lowest value or
vice versa
• The median is a positional measure
PROPERTIES OF THE MODE
• The mode is not affected by the extreme values
• It may not exist
• If the mode exists, it may not always be unique
• In finding the mode, we do not consider all the values in the
distribution
• The mode is associated with nominal data
MEASURES OF
VARIATION
MEASURES OF
VARIATION/DISPERSION/VARIABILITY
• Absolute Measures Relative Measures
• Range CV – Coefficient of Variation
• MAD – Mean Absolute Deviation CQD – Coefficient of Quartile
Deviation
• IQR – Interquartile Range Z score – Standard Score
• QD – Quartile Deviation
• Variance
• Standard Deviation
• Sk is positive – positive skewed
• Sk is negative – negatively skewed
• Sk = 0 – normal/symmetrical curve

You might also like