You are on page 1of 12

Progressive Portfolio in

Advanced Statistics

A Portfolio

Presented to

Dr. Bernie Rivas

City University of Pasay

Pasadeña St., Pasay City

In Partial Fulfillment
of the Requirements for the subject

Advanced Statistics

By

Christian B. Manginsay

November 2021

1|Page
Introduction

In this course, I have learned about the different concepts in Advanced Statistics and applied

what I know in every lesson. This portfolio is a collection of the summaries, reflection and

formative exams produced over several weeks. I think that this course will help me to established

knowledge which I can use in the future when I am teaching. This course will greatly help me in

the field of education.

Preliminary Statement

I created this portfolio to help me assess myself, gain new knowledge and to discover my

strength and weaknesses in all the discussed topics and might as well improve what is needed. I

have learned in a progressive portfolio the details of my thought and what I felt during the whole

course unit. This portfolio is a progressive type of portfolio.

Goals

The goals of making this progressive portfolio are the following:

 To be able to assess if I successfully integrated the learning objectives of the course

Advanced Statistics.

 And to develop and self-reflect how discussed topics could help me in my future career

as an educator.

2|Page
Acknowledgment

Apart from the efforts of me, the success of any project depends largely on the encouragement

and guidelines of others. I take this opportunity to express my gratitude to the people who have

been instrumental in the successful completion of this portfolio. The completion of this portfolio

gives me much pleasure.

I would like to show my greatest gratitude and appreciation to Dr. Bernie Rivas. I can't thank

him enough for his tremendous support and help. I feel motivated and encouraged every time I

attend his lecture. Without his encouragement and guidance, this portfolio would have not been

possible.

Not forgotten the guidance and support I received from all my classmate was vital for success of

this portfolio. I am grateful for their constant support and help. Last but not the least, I would

like to extend my deepest gratitude to all those who have directly and indirectly guided me in

producing this portfolio.

3|Page
Table of Contents

Title Page ………………………………………………………………………………………...1

Introduction, Preliminary Statement, Goals…………………………………………………...2

4|Page
Summaries

5|Page
1. Measures of Central Tendency

Data can be classified in various forms. One way to distinguish between data is in terms of
grouped and ungrouped data. What is ungrouped data? When the data has not been placed in any
categories and no aggregation/summarization has taken placed on the data then it is known as
ungrouped data. Ungrouped data is also known as raw data. What is a grouped data? When raw
data have been grouped in different classes then it is said to be grouped data. Before we study
more about grouped and ungrouped data it is important to understand what do we mean by
“Central Tendencies”? As the names suggest, central tendencies have something to do with the
center. Central tendency is the central location in a probability distribution. There are many
measures for central tendencies like mean, mode, median. We should also understand the
measures of central tendencies of ungrouped data. MODE: The most frequently occurring
item/value in a data set is called mode. Bimodal is used in the case when there is a tie b/w two
values. Multimodal is when a given dataset has more than two values with the same occurring
frequency. MEDIAN: The median of a dataset is described as the middlemost value in the
ordered arrangement of the values in the dataset. MEAN: Also known as the arithmetic average.
It is calculated by the summation of all values divided by the number of values.

2. Fractiles

Fractiles are measures of location or position which include not only central location but also any
position based on the number of equal divisions in a given distribution. If we divide the
distribution into four equal divisions, then we have quartiles denoted by Q1, Q2, Q3, and Q4.
The most commonly used fractiles are the quartiles, deciles, and percentiles. QUARTILES
divide a distribution into four equal parts.  DECILES are values that divide a distribution into 10
equal parts. PERCENTILES are values that divide the distribution into 100 equal parts. 

3. Measures of Dispersion

A measure of dispersion indicates the scattering of data. It explains the disparity of data from one
another, delivering a precise view of their distribution. The measure of dispersion displays and
gives us an idea about the variation and the central value of an individual item.

In other words, dispersion is the extent to which values in a distribution differ from the average
of the distribution. It gives us an idea about the extent to which individual items vary from one
another, and from the central value.

4. Introduction to Correlation

To summarize, correlation means describing a relationship between two variables. And one way
to describe the possible correlation between two variables is by using a scatter diagram which
has an x and y axis through plotting. In measuring a correlation, we can use a scatter diagram. A
scatter diagram is a graph of ordered pair (x,y) of numbers consisting of the independent variable
x, and dependent variable, y. All correlations have two properties: strength and direction. The
strength refers to the numerical value while the direction refers to whether the correlation is
positive, negative or no correlation. There are also three types of correlation: positive, negative
and no correlation at all.

6|Page
5. Spearman Rank Correlation

When should you use the Spearman's rank-order correlation? The Spearman's rank-order
correlation is the nonparametric version of the Pearson product-moment correlation. Spearman's
correlation coefficient, (ρ, also signified by rs) measures the strength and direction of association
between two ranked variables. Spearman's correlation measures the strength and direction of
monotonic association between two variables. Monotonicity is "less restrictive" than that of a
linear relationship. A monotonic relationship is not strictly an assumption of Spearman's
correlation. That is, you can run a Spearman's correlation on a non-monotonic relationship to
determine if there is a monotonic component to the association. However, you would normally
pick a measure of association, such as Spearman's correlation, that fits the pattern of the
observed data.

6. Pearson Product Moment Correlation

What does this test do? The Pearson product-moment correlation coefficient (or Pearson
correlation coefficient, for short) is a measure of the strength of a linear association between two
variables and is denoted by r. Basically, a Pearson product-moment correlation attempts to draw
a line of best fit through the data of two variables, and the Pearson correlation coefficient, r,
indicates how far away all these data points are to this line of best fit (i.e., how well the data
points fit this new model/line of best fit). What values can the Pearson correlation coefficient
take? The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A value of
0 indicates that there is no association between the two variables. A value greater than 0
indicates a positive association; that is, as the value of one variable increases, so does the value
of the other variable. A value less than 0 indicates a negative association; that is, as the value of
one variable increases, the value of the other variable decreases.

7. Multiple Correlation

The multiple correlation coefficient generalizes the standard coefficient of correlation. It is used
in multiple regression analysis to assess the quality of the prediction of the dependent variable. It
corresponds to the squared correlation between the predicted and the actual values of the
dependent variable. It can also be interpreted as the proportion of the variance of the dependent
variable explained by the independent variables. When the independent variables (used for
predicting the dependent variable) are pairwise orthogonal, the multiple correlation coefficient is
equal to the sum of the squared coefficients of correlation between each independent variable
and the dependent variable. This relation does not hold when the independent variables are not
orthogonal. The significance of a multiple coefficient of correlation can be assessed with an F
ratio. The magnitude of the multiple coefficient of correlation tends to overestimate the
magnitude of the population correlation, but it is possible to correct for this overestimation.

7|Page
8. Partial Correlation

Partial correlation is a method used to describe the relationship between two variables whilst
taking away the effects of another variable, or several other variables, on this relationship. Partial
correlation analysis is aimed at finding correlation between two variables after removing the
effects of other variables. This type of analysis helps spot spurious correlations (i.e. correlations
explained by the effect of other variables) as well as to reveal hidden correlations - i.e
correlations masked by the effect of other variables.

9. Point Biserial Correlation and Phi Coefficient


A point-biserial correlation is simply the correlation between one dichotomous variable and one
continuous variable. It turns out that this is a special case of the Pearson correlation. So,
computing the special point-biserial correlation is equivalent to computing the Pearson
correlation when one variable is dichotomous and the other is continuous. Like all correlation
analyses the Point-Biserial Correlation measures the strength of association or co-occurrence
between two variables. Correlation analyses express this strength of association in a single
value, the correlation coefficient. It is a correlation measure of the strength of association
between a continuous-level variable (ratio or interval data) and a binary variable. 
10. Multiple Regression
There are two or more independent variables in multiple regression, but only one dependent
variable. Multiple regression has two issues: overfitting and multicollinearity. Multicollinearity
and overfitting. Overfitting a non-contributing independent variable or variables is the first.
Multicollinearity is the occurrence of independent variables being linked to one another.
Multiple Regression generates potential variables such as the independent variable(s) and
dependent variable, collects data on the variables, uses scatterplots/correlations to check the
relationships between each independent variable and the dependent variable, uses
scatterplots/correlations to check the relationships among the independent variables, uses the
non-redundant independent variables in the analysis to find the best fitting model, and uses the
best fitting model.
11. Simple Regression
A simple regression is a regression model that evaluates the relationship between one
independent and one dependent variable using a straight line. The letter Y stands for the
Dependent Variable, whereas the letter X stands for the Independent Variable. It's called a
Dependent Variable because its values are affected by others, but an Independent Variable's
value is unaffected by others. y = a + bx is the formula for the Linear Regression Equation,
where y is the dependent variable, and is the y-intercept, b is the slope, and x is the independent
variable. After we've solved the y-intercept and slope, Let's solve the Linear Regression
Equation, which is y = a + bx. Replace your solution in the y-intercept and slope.

12. Aggregate Price Index


The amount of change in a variable over time is represented by the Aggregate Price index value.
To get an index number, divide the current value by the base value and multiply the result by 100
to get the percentage index. The Laspeyres index is used when the index is based on constant
quantity rates, and the Paasche index is used when the index is based on weights based on
period. The advantage of Laspeyres is that they just need a minimal quantity of data from the
base period. Its drawbacks were not reflected in changes in purchasing patterns over time.
Paasche Index has both benefits and downsides. This index has the advantage of only using data
from the current period. It also has drawbacks in that it requires a significant amount of data for
each year, which may be difficult to obtain.

8|Page
Reflections

9|Page
1. Measures of Central Tendency

Measures of central tendency are often used in research to get an idea of where most data
values lie. Other data measures that are closely related to measures of central tendency are
variance and standard deviation. The most commonly used measures of central tendency
are mean, mode and median. These measures are mostly used by primary researchers
during data analysis.

2. Fractiles

Fractiles are important in engineering and scientific applications, and a different form of
them are one of the first real life exposure many of us get to statistics, as our parents look
up the growth percentiles of our baby siblings and we look up the percentile our SAT
scores fall in.

3. Measures of Dispersion

In my own understanding, while measures of central tendency are used to estimate


"normal" values of a dataset, measures of dispersion are important for describing the
spread of the data, or its variation around a central value. Two distinct samples may have
the same mean or median, but completely different levels of variability, or vice versa. A
proper description of a set of data should include both of these characteristics. There are
various methods that can be used to measure the dispersion of a dataset, each with its own
set of advantages and disadvantages.

4. Introduction to Correlation

A correlation coefficient is a single number that represents the degree of association


between two sets of measurements. It ranges from +1 (perfect positive correlation) through
0 (no correlation at all) to -1 (perfect negative correlation). Correlations are easy to
calculate, but their interpretation has difficulties because the apparent size of the
correlation can be affected by so many different things. Correlations measure the
relationship between two variables establishing correlations allows researchers to make
predictions that increase the knowledge base. Different methods that established
correlations are used in different situations. Each methods has advantages and
disadvantages that provide researchers information that is used to understand, rank, and
visually illustrate how variables are related.

5. Spearman Rank Correlation

The sign of the Spearman correlation indicates the direction of association between X (the
independent variable) and Y (the dependent variable). If Y tends to increase when X
increases, the Spearman correlation coefficient is positive. If Y tends to decrease when X
increases, the Spearman correlation coefficient is negative. A Spearman correlation of zero
indicates that there is no tendency for Y to either increase or decrease when X increases.
The Spearman correlation increases in magnitude as X and Y become closer to being
perfectly monotone functions of each other. When X and Y are perfectly monotonically
related, the Spearman correlation coefficient becomes 1. A perfectly monotone increasing
relationship implies that for any two pairs of data values Xi, Yi and Xj, Yj, that Xi − Xj
and Yi − Yj always have the same sign. A perfectly monotone decreasing relationship
implies that these differences always have opposite signs.

6. Pearson Product Moment Correlation

In my understanding, the Pearson Product Moment Correlation evaluates the linear


relationship between two continuous variables. A relationship is linear when a change in
one variable is associated with a proportional change in the other variable. Pearson
correlation is parametric statistic and requires interval for both variables. To test its

10 | P a g e
significance, we assume normality of both the variables. Pearson's correlation coefficient
calculates the effect of change in one variable when the other variable changes. For
example: Up till a certain age, (in most cases) a child's height will keep increasing as his/her
age increases.

7. Multiple Correlation

In statistics, the coefficient of multiple correlation is a measure of how well a given variable
can be predicted using a linear function of a set of other variables. It is the correlation
between the variable's values and the best predictions that can be computed linearly from
the predictive variables. The coefficient of multiple correlation takes values between 0 and
1. Higher values indicate higher predictability of the dependent variable from the
independent variables, with a value of 1 indicating that the predictions are exactly correct
and a value of 0 indicating that no linear combination of the independent variables is a
better predictor than is the fixed mean of the dependent variable. The coefficient of
multiple correlation is known as the square root of the coefficient of determination, but
under the particular assumptions that an intercept is included and that the best possible
linear predictors are used, whereas the coefficient of determination is defined for more
general cases, including those of nonlinear prediction and those in which the predicted
values have not been derived from a model-fitting procedure.

8. Partial Correlation

Partial correlation measures the strength of a relationship between two variables, while
controlling for the effect of one or more other variables. For example, you might want to
see if there is a correlation between amount of food eaten and blood pressure, while
controlling for weight or amount of exercise. It’s possible to control for multiple variables
(called control variables or covariates). However, more than one or two is usually not
recommended because the more control variables, the less reliable your test. Partial
correlation has one continuous independent variable (the x-value) and one continuous
dependent variable (the y-value); This is the same as in regular correlation analysis. In the
blood pressure example above, the independent variable is “amount of food eaten” and the
dependent variable is “blood pressure”. The control variables — weight and amount of
exercise — should also be continuous.

9. Point Biserial Correlation and Phi Coefficient

The point-biserial correlation coefficient phi is a measure to estimate the degree of


relationship between a naturally dichotomous nominal variable and an interval or ratio
variable. For example, a researcher might want to examine the degree of relationship
between gender (a naturally occurring dichotomous nominal scale) and the students’
performance in the final examination testing persuasion skills and knowledge as measured
by scores (0–100 points; a ratio scale). Certainly, a variety of different correlation
coefficients (such as Pearson correlation coefficient, phi correlation coefficient, Spearman’s
rho, partial correlation, and part correlation) have been developed over the years for
measuring relationships between sets of data.

10. Multiple Regression

Multiple regression is a statistical technique that can be used to analyze the relationship
between a single dependent variable and several independent variables. The objective of
multiple regression analysis is to use the independent variables whose values are known to
predict the value of the single dependent value. Each predictor value is weighed, the
weights denoting their relative contribution to the overall prediction. Often, you'll want to
use some nominal variables in your multiple regression. For example, if you're doing a
multiple regression to try to predict blood pressure (the dependent variable) from
independent variables such as height, weight, age, and hours of exercise per week, you'd

11 | P a g e
also want to include sex as one of your independent variables. This is easy as you create a
variable where every female has a 0 and every male has a 1, and treat that variable as if it
were a measurement variable.

11. Simple Regression

Simple linear regression is a statistical method that allows us to summarize and study
relationships between two continuous (quantitative) variables: One variable, denoted x, is
regarded as the predictor, explanatory, or independent variable. The other variable,
denoted y, is regarded as the response, outcome, or dependent variable. Because the other
terms are used less frequently today, we'll use the "predictor" and "response" terms to
refer to the variables encountered in this course. The other terms are mentioned only to
make you aware of them should you encounter them. Simple linear regression gets its
adjective "simple," because it concerns the study of only one predictor variable.

12. Aggregate Price Index

An aggregate price index tracks the prices for a group of commodities (called a market
basket) at a given period of time to the price paid for that group of commodities at a
particular point in time in the past. The base period is the point in time in the past against
which all comparisons are made. In selecting the base period for a particular index, if
possible, you select a period of economic stability rather than one at or near the peak of an
expanding economy or the bottom of a recession or declining economy. In addition, the
base period should be relatively recent so that comparisons are not greatly affected by
changing technology and consumer attitudes and habits.

12 | P a g e

You might also like