Fitness Scale Development Project

Running head: PERCEIVED FITNESS 1
Perceived Fitness
Scale Development Project
Daemon Dias
1021316
PSYC *3250 (01)

FITNESS 2
The test conducted, was run to determine whether the test had successfully created a
novel self-report scale that reliably and validly measured a person’s level of physical fitness. In
easier terms, the goal of creating and collecting data for this test was to determine whether the
test had good psychometric properties. The construct being assessed was perceived fitness. This
construct was chosen over other possible constructs because of the implications that could be
drawn from knowing one’s level of fitness. The literature demonstrates numerous benefits that
people could reap from keeping a moderate level of fitness. Benefits include: being able to
identify pulmonary heart function, understand if you are aging normally, and the ability to
identify risk for atrial fibrillation to name a few. According to Ozbulut et al. (2013) having a
moderate level of fitness leads to increased pulmonary function and improved body composition
in schizophrenic subjects. Why do any of the things listed above matter? Truthfully none of them
matter, that is if one expects to live a short life. I am not insinuating that physical fitness leads to
a longer lifespan, Arnold Schwarzenegger could have been hit by a truck when he was twenty-
two. Though, I am insinuating that fitness can make an individual’s organs and internal systems
operate more efficiently, thus leading to a better quality of life, however long that may be. There
are many other factors at play which contribute to a better quality of life such as nutrition and
relaxation etc. Regardless of external factors, the goal of this test was to create a scale for an
individual to be able to easily identify their level of perceived fitness.
The definition of the tests construct is as follows, "perceived fitness: identifies an
individual’s; reaction time (response time to any given stimulus), strength (moving the maximum
amount of weight in one movement), flexibility (the maximum degree to which a person can
bend their limbs), and power (the ability to move as much weight as possible within the least
amount of time)". This definition will disagree with what some individuals think fitness is,
FITNESS 3
however, the literature suggests it is moderately accurate. According to Amesberger,
Finkenzeller, Würth, and Müller (2011) fitness is measured by endurance, balance, strength, and
power. In another article by Stump and Clark (2012) fitness is measured by the attributes (not
listed in the article) that allow an individual to be physically active. Peculiarly, another article
presented fitness as an attribute that contributes to a larger construct known as self-actualization.
According to Glassman (2002) physical fitness is the developing of the will of an individual.
Beyond that Glassman (2002) states that fitness is a gateway for being all that you can be as you
journey towards self-actualization. While this test does not make any attempt to measure a
person’s self-actualization, it is plausible that the two constructs could be correlated.
The literature illustrates that the definition of perceived fitness contains overlapping key
attributes with the literature which include “reaction time, strength, flexibility, and power”. To
determine whether the test has achieved its goal of containing good psychometric properties, the
study will analyze the perceived fitness scale. Item statistics, such as means, and standard
deviations will be included and determined by R studio to gain a broad idea of item answers (see
Table 2). The internal consistency of Cronbach’s alpha will be conducted to determine item
reliability, total scale reliability, and whether the scale was objectively successful or not. The
total-item correlations will be included in the analyses (see Table 2), to determine the scales
construct validity.
Methods
Scale Creation
Creating this scale was quite complex. It took more thought and planning than most other
projects I do. The scale creation included various stages, in a sense it is like writing an essay.
FITNESS 4
The scale creation consisted of brainstorming, creating rough drafts, editing,
proofreading/receiving feedback, editing, and after a lengthy process, finalizing the scale in a
presentable manner.
Procedures.
Brainstorming was not overly complicated. For the construct, I chose something that was
interesting and relatable to me that has potential to be of interest to others. Regarding the
domains, this was a little more difficult than I initially thought. The issue I ran into here was
understanding what not to include. Include too many domains and one will inevitably be
measuring the same thing between domains, and domains are typically exclusive in nature. After
brainstorming an initial pool of ten domains, I identified the overlapping domains and got rid of
any domains that I was unsure of. The next part of brainstorming was for the scale items. Since I
wanted to end up with about twelve total scale items, that meant I needed to generate around 20+
scale items. This was difficult, because asking people questions that: a) are not intrusive or could
be interpreted that way; b) have high content validity without testing; and c) are clear, simple,
and short so as to not produce fatigue in hopes to capture an individual’s true score as accurately
as possible, is not something I was familiar with. However, by referencing other self-report
questionnaires I was able to familiarize myself with how I should be asking questions, creating
domains and developing constructs. Doing this helped me drown out the noise of poor scale
items and domains.
Perhaps the most critical part of the scale creation process was receiving feedback on the
rough draft. Receiving feedback allowed me to identify the flaws and mistakes in the scale that I
was unable to see on my own account. Such problems that were fixed included the following: a)
an almost identical scale item (which is detrimental for limited scales), b) overly complex
FITNESS 5
vocabulary, and c) a construct definition that was overly lengthy. As a consequence of receiving
this feedback, the scale transformed from mediocre to exceptional.
Instructions and Response Options.
Initially, the instructions for participants seemed clear as day. After submitting and
receiving feedback from a Teaching Assistant it was obvious that altering the participant
instructions would result in better responses.
Choosing the right amount of response options was crucial. However, there was never
any confusion as to how many response options should be utilized. The perceived fitness
questionnaire used five response options. The logic behind choosing five response options
instead of three or seven was transparent. It was hypothesized that for a scale of this nature, three
response options would be to dichotomous, and seven response options could lead to confusion.
The hypothesized reasoning for confusion with seven response options stands in the fact that
participants would be filling out multiple questionnaires and could experience fatigue. By
including five response options, it was believed that participants would answer the scale item
with higher accuracy than when presented with seven. Additionally, this is the reason why scale
items were not overly descriptive and did not include complicated vocabulary. The perceived
fitness questionnaire was designed to be filled out by a fourth-grader.
Perhaps one of the most crucial parts of the response options was to include a "neither
agree nor disagree" option. Including this allows a participant to communicate two things: 1)
they do not know where they stand regarding that scale item, or 2) they do not feel comfortable
answering a scale item question and give a "null opinion" response.
Finalized Scale.
FITNESS 6
The perceived fitness scale consisted of 11 scale items, 4 domains (reaction time,
strength, flexibility, power) and one construct (perceived fitness). The self-report questionnaire
was easy to read and had clear instructions for the participants to follow. Of the 11 scale items,
participants could choose from five possible response options. Response options ranged from 1
strongly disagree to 5 strongly agree (see Table 2). Lastly, none of the scale items
asked/consisted of any uncomfortable or disturbing questions/language.
Data Collection
The data was successfully collected at the end of the psychometrics class on October 23,
2019. The self-report questionnaires were printed and handed out to students by the professor.
The students who participated in the study made a pile of the filled-out questionnaires for each
study being conducted that day. At the end of the fifty-minute class, the data was collected and
then converted, and interpreted over the next several classes. After converting and interpreting
the data in R studio there were two errors in the data that had to be fixed. Initially the study had
no reverse coded items. However, after close investigation of the data I noticed there were 3
scale items (Q_1, Q_4, and Q_6) that needed to be reverse coded. The second error with the data
is that subject 23 did not choose a response option to scale item number 3. This caused problems
when interpreting the data in R studio as correlations and means were coming up as N/A. After
searching the internet, I was able to figure out the solution to this issue by only including
relevant data. More specifically, including data which had responses that could be interpreted.
Aside from the issues stated above, nobody’s participation/responses had to be deleted.
Participants.
FITNESS 7
A total of 60 students who enrolled in the "PSYC *3250 (01) F19 " psychometrics class
at the University of Guelph, were asked to participate and give informed written consent to
participate in the study examining the development of psychometric scales. Participation in the
study was voluntary and lightly incentivized, making up two percent of the student’s final grade
in the course. The students were evaluated on October 23, 2019, during regular class time.
The participants were a mix of women and men; however, no exact numbers were
collected in terms of sex differences. The age of participants was not collected either.
Considering the psychometrics class was a third-year course, it is safe to assume that the
student's ages ranged from 19 to 21 with an exception of one or two outliers. Additionally, the
participant's identity (ie. as Caucasian, Asian-Canadian, Middle-Eastern, etc.) was not collected.
It is important to re-iterate that such properties were not collected because the purpose of this
study was to examine whether the scale contained good psychometric properties. Additionally,
participants were excluded from participating in the study if they were not students of the PSYC
*3250 (01) F19 class at the University of Guelph.
Results
To determine whether the scale contained good psychometric properties, the data
obtained was uploaded to R studio to collect descriptive and item-level statistics.
The analyses that were run to test the internal consistency and discriminatory of the scale
include the following: a) descriptive statistics analysis gathering the means and standard
deviation; b) a correlation matrix; c) reliability analysis including Cronbach's alpha and “r.drop”;
d) total score analysis; e) “apaTable” analysis. These analyses were crucial for interpreting data,
because by running the analysis we can interpret the following: a) where people lye regarding the
FITNESS 8
dimensions of the scale and how spread the data is for the test as a whole and for individual scale
items; b) inter-correlations and develop a foundation for the tests construct validity, that is how
well the scale items represent a valid measure of the construct of interest; c) determine how
reliable the test is by obtaining the test mean, test correlation, and the correlation of each item if
one of the items were dropped; d) determine how confident we can be about a given item
correlation and the likelihood it captured a score with statistical significance.
The perceived fitness scale performed relatively well, given that it was the first time I had
attempted to construct a scale informally. That is, skipping the steps where you test and refine
the scale to reduce any non-performing scale items before finalization of the scale. Objectively
the scale's reliability did not perform too well. Regarding internal consistency or Cronbach’s
alpha of the original eleven items was 0.68, with a lower alpha of 0.57 and an upper alpha of
0.80 (ɑ = 0.68, 95% CI [0.57, 0.80]) which demonstrated lower item reliability (see Table 1).
Unfortunately, a Cronbach's alpha under 0.70 is considered to have "questionable" reliability
(Gliem & Gliem, 2003). Though, when the scale is looked at by its individual parts (scale items),
the scale does demonstrate acceptable, good, and excellent reliability scores. For example, Q_5,
Q_6, and Q_11 all produce total reliability scores above the 0.70 cutoff point (see Table 1).
Going even deeper, we can examine the psychometric properties of inter-correlation properties.
By analyzing Table 3, the scale proves a few signs of exceptional reliability. For example, Q_5
and Q_6 have a correlation of 0.91. Additionally, the correlation matrix (see Table 3), shows that
the scale had good internal consistency proving weak, moderate, and strong effects between
scale items and domains. The means of the scale items ranged from 2.95 to 4.08, and the
standard deviations of the scale items ranged from 0.59 to 1.30 (see Table 1). Thus, on average,
people in the PSYC *3250 (01) F19 class had a relatively good perceived fitness score.
FITNESS 9
Conclusively, the scale performed moderately when viewing it from an objective standpoint of
containing good psychometric properties.
I did not choose to delete any items from the scale, although the data would suggest that
some items would have performed better if certain items were dropped (see Table 1). For
example, scale items Q_2, Q_5, and Q_6 would have had a greater correlation score.
Additionally, scale items Q_6 and Q_11 would have had better reliability scores if items were
dropped (see Table 1).
I did not attempt to improve or refine my scale in any way after the data was collected. A
few reasons I did not do this is because I was limited with my time between classes, studying and
maintaining a balanced life. Additionally, my R studio skills and knowledge are not up to par to
refine and improve the scale simply by using code in R studio. Perhaps, I will be able to carry
out such actions the next time I create a psychometric scale.
Discussion
This study gathered data for its test by using a self-report questionnaire on 60 University
of Guelph PSYC *3250 (01) F19 students. Using this data to test the scale for the quality of its
psychometric properties, it was concluded that the scale could use some revision. Since the
scale's reliability (0.68, see Table 1) was questionable at best, the test has room for improvement.
Future Directions
To improve this scale further, several things could be done. For example, if this scale
needed higher reliability (which it does), it would be smart to include more quality items and
response options to the scale. Including more items and response options measuring perceived
FITNESS 10
fitness would have presented more data and allowed for more potential spread. Thus, leading to
higher internal consistency and reliability.
Future research conducted to improve the scale could utilize other undergrad students as
the sample used was non-random and could have been atypical. Unfortunately, with a self-report
questionnaire, it is difficult to ask highly accurate questions such as "what is your V02 max?", in
this sense the test will be limited. Additionally, having participants fill out only this
questionnaire, separately from the other 15+ questionnaires that were conducted at the same time
will be beneficial to eliminate boredom and fatigue. Lastly, a scale should be refined properly.
Another future step would be to eliminate items that do not show high internal consistency with
other scale items and domains. As well, it may be beneficial to look at scales of unfitness, as this
may be a reference when developing items on the opposite dimension of fitness.

FITNESS 11
References
Amesberger, G., Finkenzeller, T., Würth, S., & Müller, E. (2011). Physical self-concept and
physical fitness in elderly individuals. Scandinavian Journal of Medicine & Science in
Sports, 21 Suppl 1, 83-90.
Glassman, G. (2002). What is fitness. CrossFit Journal, 1(3), 1-11.
Gliem, J. A., & Gliem, R. R. (2003, January 1). Calculating, Interpreting, And Reporting
Cronbach’s Alpha Reliability Coefficient For Likert-Type Scales. Retrieved December 1,
2019, from https://scholarworks.iupui.edu/handle/1805/344.
Keith, N., Stump, T., & Clark, D. (2012). Developing a Self-Reported Physical Fitness Survey.
Medicine And Science In Sports And Exercise, 44(7), 1388-1394.
Ozbulut, O., Genc, A., Bagcioglu, E., Coskun, K., Acar, T., Alkoc, O., . . . Ucok, K. (2013).
Evaluation of physical fitness parameters in patients with schizophrenia. Psychiatry
Research, 210(3), 806-811.
Thompson, P. (2015). Physical Fitness, Physical Activity, Exercise Training, and Atrial
Fibrillation: First the Good News, Then the Bad. Journal of the American College of
Cardiology, 66(9), 997-999.

FITNESS 12
Appendices
Perceived Fitness Data Collection
Table 1.
Mean, Standard Deviation, Item Total Correlations, and Item Reliability Index for final
Perceived Fitness scale Items (N = 60)
Item Mean Standard Item-Total Item Reliability
Deviation Correlations Index
(correlation if an
(reliability if an
item is dropped)
item is dropped)
Q_1 4.07 0.86 0.23 (0.20) 0.69 (0.085)
Q_2 4.08 0.59 0.47 (0.54) 0.66 (0.39)
Q_3 3.93 0.78 0.34 (0.33) 0.68 (0.21)
Q_4 3.18 1.23 0.58 (0.47) 0.65 (0.41)
Q_5 3.32 1.11 0.73 (0.74) 0.61 (0.61)
Q_6 3.35 1.15 0.76 (0.79) 0.60 (0.66)
Q_7 3.00 1.16 0.06 (0.01) 0.74 (0.14)

FITNESS 13
Q_8 3.85 1.30 0.43 (0.29) 0.69 (0.22)
Q_9 4.00 0.90 0.58 (054) 0.64 (0.46)
Q_10 2.95 1.20 0.44 (0.32) 0.68 (0.25)
Q_11 3.00 1.25 0.76 (0.74) 0.60 (0.64)
Note: All numbers are rounded to two decimal places. Cronbach’s coefficient alpha = 0.68
FITNESS 14
Table 2.
Scale for measuring perceived fitness among University of Guelph PSYC *3250 (01) F19
students
For each item, please circle the option that most accurately describes yourself:
1 2 3 4 5
Strongly Disagree Disagree Neither agree nor disagree Agree Strongly Agree
1. It takes me a long time to react when startled.
1 2 3 4 5
2. I react quickly to hazardous situations.
1 2 3 4 5
3. I have quick reflexes.
1 2 3 4 5
4. I struggle to open jars.
1 2 3 4 5
5. I can easily lift heavy objects.
1 2 3 4 5
FITNESS 15
6. I struggle to lift heavy objects.
1 2 3 4 5
7. I stretch often.
1 2 3 4 5
8. I can touch my toes.
1 2 3 4 5
9. I can maneuver my body easily.
1 2 3 4 5
10. I am a fast runner.
1 2 3 4 5
11. I can move heavy objects quickly.
1 2 3 4 5
Note: This was the finalized scale used for collecting data.
FITNESS 16
Table 3.
Correlation Matrix with confidence intervals
Variable 1 2 3 4 5 6 7 8 9 10
1. Q_1
.39
2. Q_2
**
[.15,
.59]
3. Q_3 .18 .49**

[-
[.27,
.08,
.66]
.42]
4. Q_4 .00 .17 .12

[-
[-.09, [-.14,
.25,
.40] .36]
.26]
5. Q_5 .05 .19 .04 .44**

[-
[-.07, [-.21, [.21,
.21,
.42] .30] .62]
.30]
6. Q_6 .13 .21 .12 .53** .91**

[-
[-.05, [-.14, [.32, [.85,
.13,
.44] .37] .69] .94]
.37]
7. Q_7 .08 .05 .04 -.23 -.05 -.16

[-
[-.21, [-.22, [-.45, [-.30, [-.40,
.17,
.30] .29] .03] .20] .09]
.33]
-
8. Q_8 .02 .13 .07 .07 .13 .12
.06
[- [-
[-.13, [-.19, [-.19, [-.13, [-.14,
.23, .31,
.37] .32] .32] .37] .36]
.28] .20]
-
9. Q_9 -.11 .25* .07 .18 .25 .29* .42**
.08
[- [-
[.00, [-.19, [-.07, [-.00, [.04, [.19,
.35, .33,
.48] .32] .42] .48] .51] .61]
.15] .18]
FITNESS 17
-
10. Q_10 -.08 -.04 .03 .20 .06 .11 .16 .47**
.06
[- [-
[-.29, [-.22, [-.05, [-.19, [-.15, [-.10, [.25,
.33, .31,
.21] .29] .43] .31] .36] .40] .65]
.18] .20]
-
11. Q_11 -.03 .28* .12 .46** .72** .72** .14 .38** .32*
.16
[- [-
[.02, [-.14, [.24, [.57, [.57, [-.12, [.14, [.07,
.28, .40,
.50] .37] .64] .82] .82] .38] .58] .53]
.22] .09]
Note. Values in square brackets indicate the 95% confidence interval for each correlation. The
confidence interval is a plausible range of population correlations that could have caused the
sample correlation (Cumming, 2014). * indicates p < .05. ** indicates p < .01.

Fitness Scale Development Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fitness Scale Development Project

Uploaded by

Copyright:

Available Formats

Running head: PERCEIVED FITNESS 1

Scale Development Project

PSYC *3250 (01)

individual to be able to easily identify their level of perceived fitness.

The definition of the tests construct is as follows, "perceived fitness: identifies an

however, the literature suggests it is moderately accurate. According to Amesberger,

presented fitness as an attribute that contributes to a larger construct known as self-actualization.

person’s self-actualization, it is plausible that the two constructs could be correlated.

The scale creation consisted of brainstorming, creating rough drafts, editing,

items and domains.

this feedback, the scale transformed from mediocre to exceptional.

Instructions and Response Options.

instructions would result in better responses.

fitness questionnaire was designed to be filled out by a fourth-grader.

answering a scale item question and give a "null opinion" response.

asked/consisted of any uncomfortable or disturbing questions/language.

*3250 (01) F19 class at the University of Guelph.

obtained was uploaded to R studio to collect descriptive and item-level statistics.

correlation and the likelihood it captured a score with statistical significance.

Unfortunately, a Cronbach's alpha under 0.70 is considered to have "questionable" reliability

containing good psychometric properties.

dropped (see Table 1).

out such actions the next time I create a psychometric scale.

higher internal consistency and reliability.

may be a reference when developing items on the opposite dimension of fitness.

physical fitness in elderly individuals. Scandinavian Journal of Medicine & Science in

Sports, 21 Suppl 1, 83-90.

Glassman, G. (2002). What is fitness. CrossFit Journal, 1(3), 1-11.

Cronbach’s Alpha Reliability Coefficient For Likert-Type Scales. Retrieved December 1,

2019, from https://scholarworks.iupui.edu/handle/1805/344.

Medicine And Science In Sports And Exercise, 44(7), 1388-1394.

Evaluation of physical fitness parameters in patients with schizophrenia. Psychiatry

Research, 210(3), 806-811.

Cardiology, 66(9), 997-999.

Perceived Fitness Data Collection

Perceived Fitness scale Items (N = 60)

Item Mean Standard Item-Total Item Reliability

Deviation Correlations Index

Q_1 4.07 0.86 0.23 (0.20) 0.69 (0.085)

Q_2 4.08 0.59 0.47 (0.54) 0.66 (0.39)

Q_3 3.93 0.78 0.34 (0.33) 0.68 (0.21)

Q_4 3.18 1.23 0.58 (0.47) 0.65 (0.41)

Q_5 3.32 1.11 0.73 (0.74) 0.61 (0.61)

Q_6 3.35 1.15 0.76 (0.79) 0.60 (0.66)

Q_7 3.00 1.16 0.06 (0.01) 0.74 (0.14)

Q_8 3.85 1.30 0.43 (0.29) 0.69 (0.22)

Q_9 4.00 0.90 0.58 (054) 0.64 (0.46)

Q_10 2.95 1.20 0.44 (0.32) 0.68 (0.25)

Q_11 3.00 1.25 0.76 (0.74) 0.60 (0.64)

1. It takes me a long time to react when startled.

2. I react quickly to hazardous situations.

3. I have quick reflexes.

4. I struggle to open jars.

5. I can easily lift heavy objects.

6. I struggle to lift heavy objects.

8. I can touch my toes.

9. I can maneuver my body easily.

10. I am a fast runner.

11. I can move heavy objects quickly.

Correlation Matrix with confidence intervals

3. Q_3 .18 .49**

4. Q_4 .00 .17 .12

5. Q_5 .05 .19 .04 .44**

6. Q_6 .13 .21 .12 .53 .91