You are on page 1of 7

Statistics

• is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or to
answer questions.
• NOT an exact science. It is usually considered a distinct mathematical science rather than a branch of
mathematics.
• is about providing a measure of confidence in any conclusions.
• concerned with facilitating wise decision-making in the face of uncertainty and that, therefore develops and
utilizes techniques for collection, effective presentation, and proper analysis of data.
Branches of Statistics
1. Descriptive Statistics – description and summarization of data. It deals with the techniques used in the
collection, presentation, organization, and analysis of the data on hand.
2. Inferential Statistics – drawing of conclusions from data. It deals with the techniques used in generalizing
from samples to populations, performing estimations and hypothesis tests determining relationships among
variables, and making prediction.
Functions of Statistics
1. Condensation – to reduce or to lessen. Condensation is mainly applied at embracing the understanding of a
huge mass of data by providing only few observations.
2. Comparison – Classification and tabulation are the two methods that are used to condense the data. They
help us to compare data collected from different sources. Grand totals, measures of central tendency,
measures of dispersion, graphs and diagrams, coefficient of correlation, etc. provide ample scope for
comparison. As statistics is an aggregate of facts and figures, comparison is always possible and in fact
comparison helps us to understand the data in a better way.
3. Forecasting – predict or to estimate beforehand. Given the data of the last ten years connected to the number
of students enrolled in PUP, it is possible to predict or forecast the number of students that will enroll in the
near future. Forecasting also plays a dominant role in business in connection with production, sales, profits,
etc. The analysis of time series and regression analysis plays an important role in forecasting.
4. Estimation – drawn inference about a population from the analysis for the sample drawn from that
population.
5. Test of Hypothesis – A statistical hypothesis is some statement about the probability distribution,
characterizing a population on the basis of the information available from the sample observations. In the
formulation and testing of hypothesis, statistical methods are extremely useful. Whether the grades of students
increased because they are motivated or whether the new teaching method is effective in discussing a particular
topic are examples of statements of hypothesis and these are tested by proper statistical tools.
Limitations of Statistics
1. Statistics is not suitable to the study of qualitative phenomenon. Since statistics is basically a science and
deals with a set of numerical data, it is applicable to the study of only these subjects of enquiry, which can be
expressed in terms of quantitative measurements. As a matter of fact, qualitative phenomenon like honesty,
poverty, beauty, intelligence etc., cannot be expressed numerically and any statistical analysis cannot be directly
applied on these qualitative phenomena.
2. Statistics does not study individuals. Statistics does not give any specific importance to the individual items;
in fact, it deals with an aggregate of objects. Individual items, when they are taken individually do not constitute
any statistical data and do not serve any purpose for any statistical enquiry.
3. Statistical laws are not exact. It is well known that mathematical and physical sciences are exact. But
statistical laws are not exact and statistical laws are only approximations. Statistical conclusions are not
universally true. They are true only on average.
4. Statistics table may be misused. Statistics must be used only by experts; otherwise, statistical methods are
the most dangerous tools on the hands of the inexpert. The use of statistical tools by the inexperienced and
untraced people might lead to wrong conclusions.
5. Statistics is only one of the methods of studying a problem. Statistical method does not provide complete
solution of the problems because problems are to be studied taking the background of the countries, culture,
philosophy, or religion into consideration. Thus, the statistical study should be supplemented by other
evidence.
Steps in Statistical Investigation
1. Defining the problem
a. Identify a specific problem.
b. Define the scope and limitations, assumptions to be made, and expected outcomes.
• If the group in consideration consists of large number of objects, we try to obtain information about the
group by examining its subgroup.
• Population – the total collection of all the elements that we are interested in.
• Sample – subgroup of the populations that will be studied in detail.

2. Collection of data
a. Make sure to collect the data properly.
b. Incomplete, fabricated, outdated, and inaccurate data are useless.
• For the data from the sample to be informative about the population, it must be representative of the
population, meaning the sample was obtain in such a way that every member of the population had an
equal chance to be included in the sample.
• Random sample – a sample of k members of a population. Also called a simple random sample if the
members are chosen in such a way that all the possible choices of the k members are equally likely.
• After a random sample is obtain from the population, we can use statistical inference to draw
generalizations about the population by examining the members of the sample.
• Sampling – the process of obtaining from the population.
o Probability sampling/random sampling – method of sampling in which every member of the
population has equal chance of being selected as sample. The preferred sampling method to
properly use the techniques of statistical inference.
▪ Simple random sampling – all possible subsets consisting of n elements selected from
the N elements of the population have the same chances of selection.
▪ Systematic random sampling – the selection of the first element is at random and the
selection of other elements in the system is systematic by subsequently taking every kth
element from the random start where k is the sampling interval.
▪ Stratified random sampling – we partition the population into non-overlapping strata
or group and then a proportional sample is chosen from each strata. The actual sample is
the sum of the sample derived from each stratum.
▪ Cluster sampling – we partition the population into non-overlapping groups or clusters
consisting of one or more elements, and then select a sample of clusters. Every member
of the selected cluster will be considered as sample.
o Non-probability sampling - method of sampling in which every member of the population does
NOT have an equal chance of being selected as sample.
▪ Accidental sampling – Sample is chosen by the researcher by obtaining members of the
population in a convenient, often haphazard way.
▪ Quota sampling - There is specified number of persons of certain types is included in
the sample. The researcher is aware of categories within the population and draws samples
from each category. The size of each categorical sample is proportional to the proportion
of the population that belongs in that category.
▪ Purposive sampling - The researcher employs his or her judgments on choosing which
he or she believes are representative of the population.
▪ Snowball sampling - This technique is also called referral sampling. A primary set of
samples are chosen based on the criteria set by the researcher. Information on where to
find succeeding set of samples having the same criteria will be gathered from this primary
set in order to expand the number of samples.
• Methods of data collection
o Survey method – method of collecting data on the variable of interest by asking people questions
either through interview or using questionnaires.
o Observation – method of obtaining data or information using our primary senses.
o Experiment – method of collecting data where there is direct human intervention on the
conditions that may affect the values of the variable of interest.
• Variables – characteristics that differentiate every individual within the population/sample.
o Qualitative variables - variables that yield categorical responses. It is a word or a code that
represents a class or category.
o Quantitative variables – take on numerical values representing an amount or quantity.
▪ Discrete variables – has either a finite number of possible values or a countable number
of possible values.
▪ Continuous variables - has an infinite number of possible values that are not countable.
• Level of measurement – relationship among the values that are assigned to the attributes for a variable.
o Nominal level – Identify, name, classify or categorize objects or events.
▪ Examples: Method of payment, type of school, eye color.
o Ordinal level – Identify, name, classify, or categorize objects or events but have an additional
property of a logical or natural order to the categories or values.
▪ Examples: Rank of a military officer, Social economic class
o Interval level – Identify, have ordered values, and have the additional property of equal distances
or intervals between scales.
▪ Examples: Temperature on Fahrenheit/Celsius Thermometer, Trait anxiety, IQ
o Ratio level – Identify, order, represent equal distances between scores values, and have an
absolute zero point.
▪ Examples: Height, weight, Number of words correctly spelled.

3. Summarization and tabulation of data


a. This refers to organization of data in text, tables, graphs, and charts, so that logical conclusion can be
derived from them.
b. Explore the data to obtain additional insight that could contribute to the study.
4. Analysis of data
a. This pertains to the process of deriving from the given data relevant information from which
numerical descriptions can be formulated.
b. Summarized data must be examined so that insights and meaningful information can be produced to
support decision-making or solutions to the question or problem at hand.
5. Interpretation of data and results
a. Refers to the task of drawing conclusions from the analyzed data.
b. Results must be able to answer the research problem and give recommendations.
6. Presentation of the result
a. Present all pertinent results in a clear and concise manner.
b. Use appropriate form of media to present results.

Presentation of data
1. Textual form – data are presented in paragraph of tex. The text highlights the important figures or results
that the researcher wishes to focus on.
2. Tabular form – data appears in a systematic manner in rows and columns.
• Types of Tables
o Simple or One-way table – has one category and usually used for frequency tables.
Year Frequency
2012 13,450
2013 13,200
2014 15,389
2015 16,790
2016 18,900
2017 19,500
Total 97,229
o Two-way table – has multiple categories.
Year
Sex
2012 2013 2014 2015 2016 2017 Total
Male 5560 6095 7386 8056 7945 6541 41493
Female 7890 7105 8003 8374 10955 13049 55736
Total 13450 13200 15389 16790 18900 19500 97229
• Construction of Data Tables
o Tables should be as simple as possible.
o Tables should be self-explanatory.
▪ Title should be clear and to the point.
▪ Each row and column should be labelled.
▪ Numerical entities of zero should be explicitly written rather than indicated by a dash.
Dashed are reserved for missing or unobserved data.
▪ Totals should be shown either in the top row and the first column or in the last row
and last column.
• Frequency distribution – tells how often a variable takes on each of its possible values.
o Construction Frequency Table (Qualitative)
▪ Use: = frequency(data_array, bins_array)
▪ Then Ctrl + Shift + Enter
• { = frequency(data_array, bins_array)}
o Construction Frequency Table (Quantitative)
▪ Set an interval or range for your data. It is needed for the “BIN RANGE”
▪ Click “DATA” on the menu bar and click “DATA ANALYSIS” on the tool bar.
▪ The dialog box “DATA ANALYSIS” will appear and choose “HISTOGRAM” on the
dialog box then click OK.
▪ Highlight your data for the “INPUT RANGE”
▪ Highlight your data for the “BIN RANGE”
▪ Click the box of “LABELS IN FIRST ROW” then click “OK”.
▪ The result will appear on the new worksheet of the excel file. Get the percentage and
total.
3. Graphical form – data or relationship among variables could be presented in visual form, thru graphs or
diagrams. In that manner, the reader can easily perceive what is being meant by the figure or any trend being
portrayed by the data.
a. Bar graph (Vertical Bar/Column Charts) – for showing comparison of amount of a variable of
interest collected over time.
i. Simple Chart

ii. Grouped Column Charts

iii. Subdivided Column Charts

b. Histogram – like the bar graph but the base of the rectangle has a length exactly equal to the class
width of the corresponding interval. Also, there are no spaces between rectangles.
c. Pictograph – similar to the bar graph but instead of bars, we use pictures or symbols represent a
value or an amount.
d. Pie graph – a circular graph partitioned into several section, depicting relative percentage with respect
to the total distribution.
e. Line graph – a graph used to visualize data that changes continuously over time.
i. Simple line graph
ii. Multiple line graph

f. Statistical map – used to show data in geographical areas.


Measure of Central Tendency – aka average, location measure that pinpoints the center or typical middle value of
a data set. A convenient way of describing a set of data with a value that describes the average characteristic of a data
set.

• Mean (𝒙)
o Arithmetic mean – obtained by adding all of its observed values and dividing the sum by the total
number of observations.
𝑛
Σ𝑥 1 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
𝑥= = ∑ 𝑥𝑖 =
𝑛 𝑛 𝑛
𝑖=1
o Weighted mean
Σw𝑥 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛
𝑥= =
Σ𝑤 𝑤1 + 𝑤2 + ⋯ + 𝑤𝑛

• Median (𝒙
̃) – middle value of a data set if the observations are arranged either in increasing or decreasing
order.
o Outliers in the data set do not affect the median, thus preferred over mean when data contains outliers.
𝑁
o If the number of data values is odd, the median is the middle data value (rounding up 2 )
𝑁 𝑁
o If the number of data values is even, the median is the mean of the two middle values ( 2 and + 1)
2
• Mode (𝒙
̂) – most frequent observation in a given data set.
o Outliers in the data set do not affect the mode.
o It is possible that the mode of a data set does not exist, and is not always unique.
o It is an appropriate measure only in the nominal level.
o Multimodal – more than one mode
▪ Bimodal - has two modes.
• Unimodal – has only one mode
Measure of Dispersion or Variability – descriptive summary measures that helps us characterize the data set in
terms of how varied the observations are from the center. If its value is small, then this indicates that the observations
are not too different from the center. On the other hand, if its value is large, then this indicates that the observations
are very different from the center or that they are widely spread out from the center.

• Range – difference between the largest and the smallest observations of items in a set of data.
o It has a limited measures because it depends on only two of the numbers (highest and lowest) in the
data set.
o It can easily be affected by outliers.
o It does not provide any information regarding the concentration of the data from the center.
• Variance
o Variance of a population data set
𝑁
1
𝜎 2 = ∑(𝑥𝑖 − 𝜇)2
𝑁
𝑖=1
o Variance of a sample data set
𝑛
2
1
𝑠 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1
𝑖=1

▪𝑛 − 1 removes the “bias” in 𝑠 2 when we want it to estimate 𝜎 2 for the purpose of making
inferences.
o Variance is measured in terms of square units because we took the squares of the deviation
(nonnegative quantity).
• Standard deviation
o Standard deviation of a population data set
𝜎 = √𝜎 2
o Standard deviation of a sample data set
𝑠 = √𝑠 2

You might also like