Professional Documents
Culture Documents
Statistics
Collection of methods for planning experiments, obtaining data, and then organizing,
summarizing, presenting, analyzing, interpreting, and drawing conclusions.
Variable
Characteristic or attribute that can assume different values
Random Variable
A variable whose values are determined by chance.
Population
All subjects possessing a common characteristic that is being studied.
Sample
A subgroup or subset of the population.
Parameter
Characteristic or measure obtained from a population.
Statistic (not to be confused with Statistics)
Characteristic or measure obtained from a sample.
1. Descriptive Statistics
Collection, organization, summarization, and presentation of data.
2. Inferential Statistics
Generalizing from samples to populations using probabilities. Performing hypothesis
testing, determining relationships between variables, and making predictions.
Qualitative Variables
Variables which assume non-numerical values.
Quantitative Variables
Variables which assume numerical values.
Discrete Variables
Variables which assume a finite or countable number of possible values. Usually
obtained by counting.
Continuous Variables
Variables which assume an infinite number of possible values. Usually obtained by
measurement.
Nominal Level
Level of measurement which classifies data into mutually exclusive, all inclusive
categories in which no order or ranking can be imposed on the data.
Ordinal Level
Level of measurement which classifies data into categories that can be ranked.
Differences between the ranks do not exist.
Interval Level
Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
Ratio Level
Level of measurement which classifies data that can be ranked, differences are
meaningful, and there is a true zero. True ratios exist between the different units of
measure.
Random Sampling
Sampling in which the data is collected using chance methods or random numbers.
Systematic Sampling
Sampling in which data is obtained by selecting every kth object.
Convenience Sampling
Sampling in which data is which is readily available is used.
Stratified Sampling
Sampling in which the population is divided into groups (called strata) according to
some characteristic. Each of these strata is then sampled using one of the other
sampling techniques.
Cluster Sampling
Sampling in which the population is divided into groups (usually geographically). Some
of these groups are randomly selected, and then all of the elements in those groups
are selected.
Statistics: Introduction
Statistics, is a branch of mathematics that deals with the theory and method of
collecting, organizing, presenting, analyzing, and interpreting data. Statistical data are
concerned with qualitative and quantitative data. Data gathering includes gathering
information through interviews, questionnaires, objective observation, experimentations,
psychological tests, and other methods.
In the field of education, statistical tools are used to gather data and information on
enrolment, finance, and physical facilities that are essentially needed to have an effective
administration and management. When an individual analyzes achievement grades, prepares
the tests, provide solution for teaching-learning process, statistical tools or techniques are
utilized.
In the field of business and economics, it plays an important role in the exploration of
new markets for a product, forecasting of business trends, control and maintenance of high-
quality products, improvement of employee-employer relationship and analysis of data
concerning insurance, investment, sales, employment, transportation, communications,
auditing and accounting procedures and the like.
In the field of science and technology, discoveries as well as inventions are made
possible through scientific experiments. The cause and effect of different variables affecting
experiments are best analysed by means of statistical techniques.
In psychology, statistical tools are used to organize data on intelligent scores,
attitudes, personality traits, ratings, aptitudes, values, behaviour, pattern, etc.
In the government, various records are collected, organized and analysed statistically
for intelligent policy-making. Some examples of these records are taxes, natural resources,
movement of population, income, expenditure, budgets, and many more.
Statistics is a very important tool in researches and studies. Statistical designs and
experiments are utilized to gather more information from a limited body of observation.
Various statistical techniques are used in the laboratories, experimental fields, or under
controlled conditions. The utilization of these tools in statistics is needed so that accurate
and reliable results are determined.
Thus, the study of statistics requires primarily the understanding of basic concepts,
symbols, and mathematical notations.
Descriptive Statistics
Descriptive statistics deals with the presentation and collection of data. This is
usually the first part of a statistical analysis. It is usually not as simple as it sounds, and the
statistician needs to be aware of designing experiments, choosing the right focus group and
avoid biases that are so easy to creep into the experiment.
Inferential Statistics
Inferential statistics, as the name suggests, involves drawing the right conclusions
from the statistical analysis that has been performed using descriptive statistics. In the end,
it is the inferences that make studies important and this aspect is dealt with in inferential
statistics.
Most predictions of the future and generalizations about a population by studying a
smaller sample come under the purview of inferential statistics. Most social sciences
experiments deal with studying a small sample population that helps determine how the
population in general behaves. By designing the right experiment, the researcher is able
to draw conclusions relevant to his study.
While drawing conclusions, one needs to be very careful so as not to draw
the wrong or biased conclusions. Even though this appears like a science, there are ways in
which one can manipulate studies and results through various means
Population vs Sample
The population includes all objects of interest whereas the sample is only a portion of
the population. Parameters are associated with populations and statistic with samples.
Parameters are usually denoted using Greek letters (mu, sigma) while statistic are usually
denoted using Roman letters (x, s).
There are several reasons why we don't work with populations. They are usually large,
and it is often impossible to get data for every object we're studying. Sampling does not
usually occur without cost, and the more items surveyed, the larger the cost.
We compute statistics, and use them to estimate parameters. The computation is the
first part of the statistics course (Descriptive Statistics) and the estimation is the second
part (Inferential Statistics)
Parameter
o Characteristic or measure obtained from a population.
Statistic (not to be confused with Statistics)
o Characteristic or measure obtained from a sample.
Levels of Measurement
There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go
from lowest level to highest level. Data is classified according to the highest level which it
fits. Each additional level adds something the previous level didn't have.
Nominal is the lowest level. Only names are meaningful here.
Nominal Scale. Nominal variables (also called categorical variables) can be
placed into categories. They don’t have a numeric value and so cannot be
added, subtracted, divided or multiplied. They also have no order; if they
appear to have an order then you probably have ordinal variables instead.
Nominal variables are variables that have two or more categories, but which do
not have an intrinsic order. For example, a real estate agent could classify
their types of property into distinct categories such as houses, condos, co-ops
or bungalows. Ordinal adds an order to the names. Brand names of motorcycle
in Philippine market, name of public and private universities in Negros Island,
and others.
Dichotomous variables are nominal variables which have only two categories or
levels. For example, if we were looking at gender, we would most probably
categorize somebody as either "male" or "female". This is an example of a
dichotomous variable (and also a nominal variable). Another example might be
if we asked a person if they owned a mobile phone. Here, we may categorize
mobile phone ownership as either "Yes" or "No".
Ordinal variable
Thus, the result can be ranked, you can rank them from the most positive (Yes,
a lot), to the middle response (They are OK), to the least positive (Not very much).
However, while we can rank the levels, we cannot place a "value" to them; we cannot
say that "They are OK" is twice as positive as "Not very much" for example.
Another example:
Response Strongly disagree undecided agree Strongly agree
disagree
Rating 1 2 3 4 5
Example:
Likert Scale
Net Promoter Score (NPS)
How likely is it that you would recommend this company to a friend or colleague?
Ratio variables are interval variables, but with the added condition that 0 (zero) of
the measurement indicates that there is none of that variable. So, temperature
measured in degrees Celsius or Fahrenheit is not a ratio variable because 0°C does not
mean there is no temperature. However, temperature measured in Kelvin is a ratio
variable as 0 Kelvin (often called absolute zero) indicates that there is no
temperature whatsoever. Other examples of ratio variables include height, mass,
distance and many more.
The name "ratio" reflects the fact that you can use the ratio of measurements.
So, for example, a distance of ten metres is twice the distance of 5 metres.
Ambiguities in classifying a type of variable
https://byjus.com/maths/scales-of-measur 1
In some cases, the measurement scale for data is ordinal, but the variable is treated
as continuous. For example, a Likert scale that contains five values - strongly agree, agree,
neither agree nor disagree, disagree, and strongly disagree - is ordinal. However, where a
Likert scale contains seven or more value - strongly agree, moderately agree, agree, neither
agree nor disagree, disagree, moderately disagree, and strongly disagree - the underlying
scale is sometimes treated as continuous (although where you should do this is a cause of
great dispute).
It is worth noting that how we categorize variables is somewhat of a choice. Whilst
we categorized gender as a dichotomous variable (you are either male or female), social
scientists may disagree with this, arguing that gender is a more complex variable involving
more than two distinctions, but also including measurement levels like genderqueer,
intersex and transgender. At the same time, some researchers would argue that a Likert
scale, even with seven values, should never be treated as a continuous variable.
Population vs Sample – What is the difference?
Population Sample
The measurable characteristic of the
The measurable characteristic of the sample is
population like the mean or standard
called a statistic.
deviation is known as the parameter.
Population data is a whole and complete The sample is a subset of the population that is
set. derived using sampling.
A survey done of an entire population is
A survey done using a sample of the population
accurate and more precise with no margin of
bears accurate results, only after further
error except human inaccuracy in responses.
factoring the margin of error and confidence
However, this may not be
interval.
possible always.
The parameter of the population is a The statistic is the descriptive component of
numerical or measurable element that the sample found by using sample mean or
defines the system of the set. sample proportion.
Although Population and Sample are two different terms, they both are related to
each other. The population is used to draw samples. To make statistical inferences about
the population is the primary purpose of the sample. Without the population, samples can’t
exist. The better the quality of the sample, the higher the level of accuracy of
generalization.
Random sampling is analogous to putting everyone's name into a hat and drawing out
several names. Each element in the population has an equal chance of occuring.
While this is the preferred way of sampling, it is often difficult to do. It requires that
a complete list of every element in the population be obtained. Computer generated
lists are often used with random sampling.
Systematic sampling is easier to do than random sampling. In systematic sampling, the
list of elements is "counted off". That is, every kth element is taken. This is similar to
lining everyone up and numbering off "1,2,3,4; 1,2,3,4; etc". When done numbering,
all people numbered 4 would be used.
Stratified sampling also divides the population into groups called strata. However, this
time it is by some characteristic, not geographically. For instance, the population
might be separated into males and females. A sample is taken from each of these
strata using either random, systematic, or convenience sampling.
Cluster sampling is accomplished by dividing the population into groups -- usually
geographically. These groups are called clusters or blocks. The clusters are randomly
selected, and each element in the selected clusters are used.
Disadvantages
Cluster sampling: might not work well if unit members are not homogeneous (i.e. if
they are different from each other).
Simple random sampling: tedious and time consuming, especially when creating larger
samples.
Stratified random sampling: tedious and time consuming, especially when creating
larger samples.
Systematic sampling: not as random as simple random sampling,
Non-probability sampling
Snowball sampling is where research participants recruit other participants for a test
or study. It is used where potential participants are hard to find. It’s called snowball
sampling because (in theory) once you have the ball rolling, it picks up more “snow” along
the way and becomes larger and larger. Snowball sampling is a non-probability sampling
method. It doesn’t have the probability involved, with say, simple random sampling (where
the odds are the same for any particular participant being chosen). Rather, the researchers
used their own judgment to choose participants.
Snowball sampling consists of two steps:
1. Identify potential subjects in the population. Often, only one or two subjects can be
found initially.
2. Ask those subjects to recruit other people (and then ask those people to recruit.
Participants should be made aware that they do not have to provide any other names.
These steps are repeated until the needed sample size is found. Ethically, the study
participants should not be asked to identify other potential participants. Rather, they should
be asked to encourage others to come forward. When individuals are named, it’s sometimes
called “cold-calling”, as you are calling out of the blue. Cold-calling is usually reserved for
snowball sampling where there’s no risk of potential embarrassment or other ethical
dilemmas.
Advantages:
It allows for studies to take place where otherwise it might be impossible to conduct
because of a lack of participants.
Snowball sampling may help you discover characteristics about a population that you
weren’t aware existed. For example, the casual illegal downloader vs. the for-profit
downloader.
Disadvantages:
It is usually impossible to determine the sampling error or make inferences about
populations based on the obtained sample.
Snowball sampling is also known as cold-calling, chain sampling, chain-referral sampling,
and referral sampling.