Professional Documents
Culture Documents
SW Statistics Module
SW Statistics Module
AY 2022-2023
MODULE IN
SOCIAL WORK STATISTICS
BY
TABLE OF CONTENTS
i. Introduction to Statistics
ii. Research Ethics
iii. Descriptive and Inferential Statistics: Overview
iv. Levels of Measurement
v. Reliability and Validity
vi. Statistical Tools
a. T-Test
b. ANOVA
c. Correlation
d. Regression
i. Introduction to Statistics
ETYMOLOGY
The term statistics is from the New Latin “statisticum collegium” or "council of state" and
the Italian word “statista” or "statesman" or "politician". The study is originally focused on the
analysis of data about the state. Over the years, it acquired the meaning of “collection and
classification of data in general.” Its principal purpose was collection of data to be used by
governmental and administrative bodies about states and localities. In
particular, censuses provide frequently updated information about the population.
Activity 1
Based on the etymology of “Statistics” and its original intended purpose, what is the significance
of this study today? Answer in not more than five (5) sentences. [20 points]
Reference: https://psa.gov.ph/content/employment-situation-october-2020
The table above presents the latest Employment Rate of the Philippines in October 2020.
The first row provides for the number of labor force able and willing to work aged 15 and above.
Labor Participation Rate, on the other hand pertains to the willingness of the labor force to be
employed, which is measure through the active search and application of the workers.
Employment Rate pertains to the percentage of the labor force employed, while
Underemployment Rate refers to the workforce employed in a position which is below their
salary range or skill set. Lastly, Unemployment Rate indicates the percentage of the workforce
unable to get a job. Based on this table, what can you surmise?
Activity 2
Draft a program that will alleviate the identified problem from the data above.
OBJECTIVE/S:
BENEFICIARIES:
PROCEDURE:
STATISTICAL JARGONS
Indicate whether the data is Accurate or Precise. Note that the true value is 100.
The mean (or average) is the most popular measure of central tendency. It can be used
with continuous data. The mean is equal to the sum of all the values in the data set divided
by the number of values in the data set.
It is a model of your data set since it is the value that is most common. You will notice,
however, that the mean is not often one of the actual values that you have observed in
your data set. However, one of its important properties is that it minimizes error in the
prediction of any one value in your data set. That is, it is the value that produces the lowest
amount of error from all other values in the data set.
An important property of the mean is that it includes every value in your data set as part
of the calculation. In addition, the mean is the only measure of central tendency where the
sum of the deviations of each value from the mean is always zero.
The mean has one main disadvantage: it is particularly susceptible to the influence of
outliers. These are values that are unusual compared to the rest of the data set by being
especially small or large in numerical value. For example, consider the wages of staff at a
factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests
that this mean value might not be the best way to accurately reflect the typical salary of a
worker, as most workers have salaries in the $12k to 18k range. The mean is being skewed
by the two large salaries. Therefore, in this situation, we would like to have a better
measure of central tendency. As we will find out later, taking the median would be a better
measure of central tendency in this situation.
The median is the middle score for a set of data that has been arranged in order of
magnitude. The median is less affected by outliers and skewed data. In order to calculate
the median, suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the
middle mark because there are 5 scores before it and 5 scores after it. This works fine
when you have an odd number of scores, but what happens when you have an even
number of scores? What if you had only 10 scores? Well, you simply have to take the
middle two scores and average the result. So, if we look at the example below:
65 55 89 56 35 14 56 55 87 45
35 36 37 39 22 18 50 31 40 32
88 80 95 100 78 91 97 34 50 21
65 87 45 92 77 82 79 67 99 80
75 65 77 21 23 45 60 68 32 10
Another problem with the mode is that it will not provide us with a very good measure
of central tendency when the most common mark is far away from the rest of the data in
the data set, as depicted in the diagram below:
Nominal Mode
Ordinal Median
The second level of measurement is the ordinal level of measurement. This depicts
some ordered relationship among the variable’s observations. Suppose a student scores
the highest grade of 100 in the class. In this case, he would be assigned the first rank.
Then, another classmate scores the second highest grade of 92; she would be assigned the
second rank. A third student scores 81 and he would be assigned the third rank, and so on.
The ordinal level of measurement indicates an ordering of the measurements.
The third level of measurement is the interval level of measurement. It does not only
classify and order the measurements, but it also specifies that the distances between each
interval on the scale are equivalent along the scale from low interval to high interval. For
example, an interval level of measurement could be the measurement of anxiety in a student
between the score of 10 and 11, this interval is the same as that of a student who scores
between 40 and 41. A popular example of this level of measurement is temperature in
centigrade, where, for example, the distance between 940C and 960C is the same as the
distance between 1000C and 1020C.
The fourth level of measurement is the ratio level of measurement. In this level of
measurement, the observations, in addition to having equal intervals, can have a value of
zero as well. The zero in the scale makes this type of measurement unlike the other types of
measurement, although the properties are similar to that of the interval level of
measurement. In the ratio level of measurement, the divisions between the points on the
scale have an equivalent distance between them.
Activity 5
Activity 6
Reliability is the extent to which test scores are consistent, with respect to one or
more sources of inconsistency such as the selection of specific questions, the selection
of raters, the day and time of testing. Without good reliability, it is difficult for you to
trust that the data provided by the measure is an accurate representation of the
participant's performance rather than due to irrelevant artefacts in the testing session
such as environmental, psychological, or methodological processes.
What could be the reason for varying scores? Some possible reasons are the
following:
There are several types of reliability estimates, each influenced by different sources of
measurement error. Test developers have the responsibility of reporting the reliability
estimates that are relevant for a particular test. Before deciding to use a test, read the test
manual and any independent reviews to determine if its reliability is acceptable. The acceptable
level of reliability will differ depending on the type of test and the reliability estimate used.
The discussion in Table 2 should help you develop some familiarity with the different kinds of
reliability estimates reported in test manuals and reviews.
Types of Reliability Estimates
• Test-retest reliability indicates the repeatability of test scores with the passage of time.
This estimate also reflects the stability of the characteristic or construct being measured
by the test.
Some constructs are more stable than others. For example, an individual's reading
ability is more stable over a particular period of time than that individual's anxiety level.
Therefore, you would expect a higher test-retest reliability coefficient on a reading test
than you would on a test that measures anxiety.
• Alternate or parallel form reliability indicates how consistent test scores are likely to be
if a person takes two or more forms of a test.
A high parallel form reliability coefficient indicates that the different forms of the test
are very similar which means that it makes virtually no difference which version of the
test a person takes. On the other hand, a low parallel form reliability coefficient
suggests that the different forms are probably not comparable; they may be measuring
different things and therefore cannot be used interchangeably.
• Inter-rater reliability indicates how consistent test scores are likely to be if the test is
scored by two or more raters.
On some tests, raters evaluate responses to questions and determine the score.
Differences in judgments among raters are likely to produce variations in test scores. A
high inter-rater reliability coefficient indicates that the judgment process is stable and
the resulting scores are reliable.
Inter-rater reliability coefficients are typically lower than other types of reliability
estimates. However, it is possible to obtain higher levels of inter-rater reliabilities if
raters are appropriately trained.
• Internal consistency reliability indicates the extent to which items on a test measure
the same thing.
A high internal consistency reliability coefficient for a test indicates that the items on
the test are very similar to each other in content (homogeneous). It is important to note
that the length of a test can affect internal consistency reliability. For example, a very
lengthy test can spuriously inflate the reliability coefficient.
Tests that measure multiple characteristics are usually divided into distinct components.
Manuals for such tests typically report a separate internal consistency reliability
coefficient for each component in addition to one for the whole test.
Test manuals and reviews report several kinds of internal consistency reliability
estimates. Each type of estimate is appropriate under certain circumstances. The test
manual should explain why a particular estimate is reported.
Validity is the most important issue in selecting a test. Validity refers to what characteristic the
test measures and how well the test measures that characteristic.
• Validity tells you if the characteristic being measured by a test is related to what you
intend to measure.
• Validity gives meaning to the test scores. For example, validity evidence indicates that
there is linkage between test performance and job performance. It can tell you what
you may conclude or predict about someone from his or her score on the test. If a test
has been demonstrated to be a valid predictor of performance on a specific job, you can
conclude that persons scoring high on the test are more likely to perform well on the job
than persons who score low on the test, all else being equal.
• Validity also describes the degree to which you can make specific conclusions or
predictions about people based on their test scores. In other words, it indicates the
usefulness of the test.
Reliability versus validity. Validity will tell you how good a test is for a particular situation;
reliability will tell you how trustworthy a score on that test will be. You cannot draw valid
conclusions from a test score unless you are sure that the test is reliable. Even when a test
is reliable, it may not be valid. You should be careful that any test you select is both reliable
and valid for your situation.
Similarly, a test's validity is established in reference to specific groups. These groups are called
the reference groups. The test may not be valid for different groups. For example, a test designed
to predict the performance of managers in situations requiring problem solving may not allow
you to make valid or meaningful predictions about the performance of clerical employees. If, for
example, the kind of problem-solving ability required for the two positions is different, or the
reading level of the test is not suitable for clerical applicants, the test results may be valid for
managers, but not for clerical employees.
Test developers have the responsibility of describing the reference groups used to develop the
test. The manual should describe the groups for whom the test is valid, and the interpretation of
scores for individuals belonging to each of these groups. You must determine if the test can be
used appropriately with the people you want to test. This group of people is called your target
population or target group.
Your target group and the reference group do not have to match on all factors; they must be
sufficiently similar so that the test will yield meaningful scores for your group. For example, a
writing ability test developed for use with college seniors may be appropriate for measuring the
writing ability of white-collar professionals or managers, even though these groups do not have
identical characteristics. In determining the appropriateness of a test for your target groups,
consider factors such as occupation, reading level, cultural differences, and language barriers.
In order to be certain an employment test is useful and valid, evidence must be collected
relating the test to a job. The process of establishing the job relatedness of a test is
called validation.
Methods for conducting validation studies
Conducting your own validation study is expensive, and, in many cases, you may not have
enough employees in a relevant job category to make it feasible to conduct a study. Therefore,
you may find it advantageous to use professionally developed assessment tools and procedures
for which documentation on validity already exists. However, care must be taken to make sure
that validity evidence obtained for an "outside" test study can be suitably "transported" to your
particular situation.
The Uniform Guidelines, the Standards, and the SIOP Principles state that evidence of
transportability is required. Consider the following when using outside tests:
• Validity evidence. The validation procedures used in the studies must be consistent with
accepted standards.
• Job similarity. A job analysis should be performed to verify that your job and the original
job are substantially similar in terms of ability requirements and work behavior.
• Fairness evidence. Reports of test fairness from outside studies must be considered for
each protected group that is part of your labor market. Where this information is not
available for an otherwise qualified test, an internal study of test fairness should be
conducted, if feasible.
• Other significant variables. These include the type of performance measures and
standards used, the essential work activities performed, the similarity of your target
group to the reference samples, as well as all other situational factors that might affect
the applicability of the outside test for your use.
a. T-Test
Essentially, a t-test allows us to compare the average values of the two data sets
and determine if they came from the same population. For instance, you would like
to know which mode of teaching is the best for grade one students, visual or auditory.
You taught visually to Section A, then conducted and exam and then you taught
auditorily to Section B then tested them.
If the null hypothesis qualifies to be rejected, it indicates that data readings are
strong and are probably not due to chance.
TYPES OF T-TEST
T- Test Dependent: 1 sample; 2 scores
For instance, you wanted to know your students are actually learning from your
lessons. First you conducted a diagnostic exam for the lesson you were going to teach
them, then, after discussion, you will conduct the same test to see if their scores
increased, remained or decreased.
PRE-TEST POST-TEST
Harry 84 90
Ron 80 89
Hermoine 90 100
In this study, the condition being tested is the discussion of the lesson. We can
compare the scores based on the table; however, we can never be sure if the results
are significant or not without running the data in t-test.
Unequal Variance, on the other hand, has unequal number of individuals between
the groups being compared. For instance, you wanted to compare the frequency of
political interaction through comments in social media between millennials and
generation z.
Millennials Generation Z
4 6
7 4
6 1
9 9
1 7
2 5
0
3
GUIDE
Activity 7
Give at least TWO (2) examples each of t-test dependent, t-test independent equal
variance and t-test independent unequal variance.
Activity 8
b. ANOVA
ANOVA is an extension of t-test since it can only cater and compare TWO
groups of data set. There are two types of ANOVA: One-Way ANOVA and Two-Way
ANOVA.
One-Way ANOVA is extremely similar with t-test, the only difference is that is can
compare more that two groups. For example, you wanted to know which among
genders send more text messages in a month.
Two-Way ANOVA on the other hand is a little different since it has two DVs. For
example, you wanted to know which medicine best cures insomnia and your basis of
cure is the length and quality of sleep of insomnia patients. In this case, your IV is
the medicine (placebo ang drug) and your DVs are quality and length of sleep.
c. Correlation
Activity 9
If the regression coefficient resulted to -0.71, it means that for every unit of
change in the academic achievement of the students, there is a corresponding 71%
decrease in their happiness.
Activity 10
Draft a mini research plan that will employ one of the discussed statistical tools.
Gather data and present the results.