You are on page 1of 17

Running head: PERCEIVED FITNESS 1

Perceived Fitness

Scale Development Project

Daemon Dias

1021316

PSYC *3250 (01)


FITNESS 2

The test conducted, was run to determine whether the test had successfully created a

novel self-report scale that reliably and validly measured a person’s level of physical fitness. In

easier terms, the goal of creating and collecting data for this test was to determine whether the

test had good psychometric properties. The construct being assessed was perceived fitness. This

construct was chosen over other possible constructs because of the implications that could be

drawn from knowing one’s level of fitness. The literature demonstrates numerous benefits that

people could reap from keeping a moderate level of fitness. Benefits include: being able to

identify pulmonary heart function, understand if you are aging normally, and the ability to

identify risk for atrial fibrillation to name a few. According to Ozbulut et al. (2013) having a

moderate level of fitness leads to increased pulmonary function and improved body composition

in schizophrenic subjects. Why do any of the things listed above matter? Truthfully none of them

matter, that is if one expects to live a short life. I am not insinuating that physical fitness leads to

a longer lifespan, Arnold Schwarzenegger could have been hit by a truck when he was twenty-

two. Though, I am insinuating that fitness can make an individual’s organs and internal systems

operate more efficiently, thus leading to a better quality of life, however long that may be. There

are many other factors at play which contribute to a better quality of life such as nutrition and

relaxation etc. Regardless of external factors, the goal of this test was to create a scale for an

individual to be able to easily identify their level of perceived fitness.

The definition of the tests construct is as follows, "perceived fitness: identifies an

individual’s; reaction time (response time to any given stimulus), strength (moving the maximum

amount of weight in one movement), flexibility (the maximum degree to which a person can

bend their limbs), and power (the ability to move as much weight as possible within the least

amount of time)". This definition will disagree with what some individuals think fitness is,
FITNESS 3

however, the literature suggests it is moderately accurate. According to Amesberger,

Finkenzeller, Würth, and Müller (2011) fitness is measured by endurance, balance, strength, and

power. In another article by Stump and Clark (2012) fitness is measured by the attributes (not

listed in the article) that allow an individual to be physically active. Peculiarly, another article

presented fitness as an attribute that contributes to a larger construct known as self-actualization.

According to Glassman (2002) physical fitness is the developing of the will of an individual.

Beyond that Glassman (2002) states that fitness is a gateway for being all that you can be as you

journey towards self-actualization. While this test does not make any attempt to measure a

person’s self-actualization, it is plausible that the two constructs could be correlated.

The literature illustrates that the definition of perceived fitness contains overlapping key

attributes with the literature which include “reaction time, strength, flexibility, and power”. To

determine whether the test has achieved its goal of containing good psychometric properties, the

study will analyze the perceived fitness scale. Item statistics, such as means, and standard

deviations will be included and determined by R studio to gain a broad idea of item answers (see

Table 2). The internal consistency of Cronbach’s alpha will be conducted to determine item

reliability, total scale reliability, and whether the scale was objectively successful or not. The

total-item correlations will be included in the analyses (see Table 2), to determine the scales

construct validity.

Methods

Scale Creation

Creating this scale was quite complex. It took more thought and planning than most other

projects I do. The scale creation included various stages, in a sense it is like writing an essay.
FITNESS 4

The scale creation consisted of brainstorming, creating rough drafts, editing,

proofreading/receiving feedback, editing, and after a lengthy process, finalizing the scale in a

presentable manner.

Procedures.

Brainstorming was not overly complicated. For the construct, I chose something that was

interesting and relatable to me that has potential to be of interest to others. Regarding the

domains, this was a little more difficult than I initially thought. The issue I ran into here was

understanding what not to include. Include too many domains and one will inevitably be

measuring the same thing between domains, and domains are typically exclusive in nature. After

brainstorming an initial pool of ten domains, I identified the overlapping domains and got rid of

any domains that I was unsure of. The next part of brainstorming was for the scale items. Since I

wanted to end up with about twelve total scale items, that meant I needed to generate around 20+

scale items. This was difficult, because asking people questions that: a) are not intrusive or could

be interpreted that way; b) have high content validity without testing; and c) are clear, simple,

and short so as to not produce fatigue in hopes to capture an individual’s true score as accurately

as possible, is not something I was familiar with. However, by referencing other self-report

questionnaires I was able to familiarize myself with how I should be asking questions, creating

domains and developing constructs. Doing this helped me drown out the noise of poor scale

items and domains.

Perhaps the most critical part of the scale creation process was receiving feedback on the

rough draft. Receiving feedback allowed me to identify the flaws and mistakes in the scale that I

was unable to see on my own account. Such problems that were fixed included the following: a)

an almost identical scale item (which is detrimental for limited scales), b) overly complex
FITNESS 5

vocabulary, and c) a construct definition that was overly lengthy. As a consequence of receiving

this feedback, the scale transformed from mediocre to exceptional.

Instructions and Response Options.

Initially, the instructions for participants seemed clear as day. After submitting and

receiving feedback from a Teaching Assistant it was obvious that altering the participant

instructions would result in better responses.

Choosing the right amount of response options was crucial. However, there was never

any confusion as to how many response options should be utilized. The perceived fitness

questionnaire used five response options. The logic behind choosing five response options

instead of three or seven was transparent. It was hypothesized that for a scale of this nature, three

response options would be to dichotomous, and seven response options could lead to confusion.

The hypothesized reasoning for confusion with seven response options stands in the fact that

participants would be filling out multiple questionnaires and could experience fatigue. By

including five response options, it was believed that participants would answer the scale item

with higher accuracy than when presented with seven. Additionally, this is the reason why scale

items were not overly descriptive and did not include complicated vocabulary. The perceived

fitness questionnaire was designed to be filled out by a fourth-grader.

Perhaps one of the most crucial parts of the response options was to include a "neither

agree nor disagree" option. Including this allows a participant to communicate two things: 1)

they do not know where they stand regarding that scale item, or 2) they do not feel comfortable

answering a scale item question and give a "null opinion" response.

Finalized Scale.
FITNESS 6

The perceived fitness scale consisted of 11 scale items, 4 domains (reaction time,

strength, flexibility, power) and one construct (perceived fitness). The self-report questionnaire

was easy to read and had clear instructions for the participants to follow. Of the 11 scale items,

participants could choose from five possible response options. Response options ranged from 1

strongly disagree to 5 strongly agree (see Table 2). Lastly, none of the scale items

asked/consisted of any uncomfortable or disturbing questions/language.

Data Collection

The data was successfully collected at the end of the psychometrics class on October 23,

2019. The self-report questionnaires were printed and handed out to students by the professor.

The students who participated in the study made a pile of the filled-out questionnaires for each

study being conducted that day. At the end of the fifty-minute class, the data was collected and

then converted, and interpreted over the next several classes. After converting and interpreting

the data in R studio there were two errors in the data that had to be fixed. Initially the study had

no reverse coded items. However, after close investigation of the data I noticed there were 3

scale items (Q_1, Q_4, and Q_6) that needed to be reverse coded. The second error with the data

is that subject 23 did not choose a response option to scale item number 3. This caused problems

when interpreting the data in R studio as correlations and means were coming up as N/A. After

searching the internet, I was able to figure out the solution to this issue by only including

relevant data. More specifically, including data which had responses that could be interpreted.

Aside from the issues stated above, nobody’s participation/responses had to be deleted.

Participants.
FITNESS 7

A total of 60 students who enrolled in the "PSYC *3250 (01) F19 " psychometrics class

at the University of Guelph, were asked to participate and give informed written consent to

participate in the study examining the development of psychometric scales. Participation in the

study was voluntary and lightly incentivized, making up two percent of the student’s final grade

in the course. The students were evaluated on October 23, 2019, during regular class time.

The participants were a mix of women and men; however, no exact numbers were

collected in terms of sex differences. The age of participants was not collected either.

Considering the psychometrics class was a third-year course, it is safe to assume that the

student's ages ranged from 19 to 21 with an exception of one or two outliers. Additionally, the

participant's identity (ie. as Caucasian, Asian-Canadian, Middle-Eastern, etc.) was not collected.

It is important to re-iterate that such properties were not collected because the purpose of this

study was to examine whether the scale contained good psychometric properties. Additionally,

participants were excluded from participating in the study if they were not students of the PSYC

*3250 (01) F19 class at the University of Guelph.

Results

To determine whether the scale contained good psychometric properties, the data

obtained was uploaded to R studio to collect descriptive and item-level statistics.

The analyses that were run to test the internal consistency and discriminatory of the scale

include the following: a) descriptive statistics analysis gathering the means and standard

deviation; b) a correlation matrix; c) reliability analysis including Cronbach's alpha and “r.drop”;

d) total score analysis; e) “apaTable” analysis. These analyses were crucial for interpreting data,

because by running the analysis we can interpret the following: a) where people lye regarding the
FITNESS 8

dimensions of the scale and how spread the data is for the test as a whole and for individual scale

items; b) inter-correlations and develop a foundation for the tests construct validity, that is how

well the scale items represent a valid measure of the construct of interest; c) determine how

reliable the test is by obtaining the test mean, test correlation, and the correlation of each item if

one of the items were dropped; d) determine how confident we can be about a given item

correlation and the likelihood it captured a score with statistical significance.

The perceived fitness scale performed relatively well, given that it was the first time I had

attempted to construct a scale informally. That is, skipping the steps where you test and refine

the scale to reduce any non-performing scale items before finalization of the scale. Objectively

the scale's reliability did not perform too well. Regarding internal consistency or Cronbach’s

alpha of the original eleven items was 0.68, with a lower alpha of 0.57 and an upper alpha of

0.80 (ɑ = 0.68, 95% CI [0.57, 0.80]) which demonstrated lower item reliability (see Table 1).

Unfortunately, a Cronbach's alpha under 0.70 is considered to have "questionable" reliability

(Gliem & Gliem, 2003). Though, when the scale is looked at by its individual parts (scale items),

the scale does demonstrate acceptable, good, and excellent reliability scores. For example, Q_5,

Q_6, and Q_11 all produce total reliability scores above the 0.70 cutoff point (see Table 1).

Going even deeper, we can examine the psychometric properties of inter-correlation properties.

By analyzing Table 3, the scale proves a few signs of exceptional reliability. For example, Q_5

and Q_6 have a correlation of 0.91. Additionally, the correlation matrix (see Table 3), shows that

the scale had good internal consistency proving weak, moderate, and strong effects between

scale items and domains. The means of the scale items ranged from 2.95 to 4.08, and the

standard deviations of the scale items ranged from 0.59 to 1.30 (see Table 1). Thus, on average,

people in the PSYC *3250 (01) F19 class had a relatively good perceived fitness score.
FITNESS 9

Conclusively, the scale performed moderately when viewing it from an objective standpoint of

containing good psychometric properties.

I did not choose to delete any items from the scale, although the data would suggest that

some items would have performed better if certain items were dropped (see Table 1). For

example, scale items Q_2, Q_5, and Q_6 would have had a greater correlation score.

Additionally, scale items Q_6 and Q_11 would have had better reliability scores if items were

dropped (see Table 1).

I did not attempt to improve or refine my scale in any way after the data was collected. A

few reasons I did not do this is because I was limited with my time between classes, studying and

maintaining a balanced life. Additionally, my R studio skills and knowledge are not up to par to

refine and improve the scale simply by using code in R studio. Perhaps, I will be able to carry

out such actions the next time I create a psychometric scale.

Discussion

This study gathered data for its test by using a self-report questionnaire on 60 University

of Guelph PSYC *3250 (01) F19 students. Using this data to test the scale for the quality of its

psychometric properties, it was concluded that the scale could use some revision. Since the

scale's reliability (0.68, see Table 1) was questionable at best, the test has room for improvement.

Future Directions

To improve this scale further, several things could be done. For example, if this scale

needed higher reliability (which it does), it would be smart to include more quality items and

response options to the scale. Including more items and response options measuring perceived
FITNESS 10

fitness would have presented more data and allowed for more potential spread. Thus, leading to

higher internal consistency and reliability.

Future research conducted to improve the scale could utilize other undergrad students as

the sample used was non-random and could have been atypical. Unfortunately, with a self-report

questionnaire, it is difficult to ask highly accurate questions such as "what is your V02 max?", in

this sense the test will be limited. Additionally, having participants fill out only this

questionnaire, separately from the other 15+ questionnaires that were conducted at the same time

will be beneficial to eliminate boredom and fatigue. Lastly, a scale should be refined properly.

Another future step would be to eliminate items that do not show high internal consistency with

other scale items and domains. As well, it may be beneficial to look at scales of unfitness, as this

may be a reference when developing items on the opposite dimension of fitness.


FITNESS 11

References

Amesberger, G., Finkenzeller, T., Würth, S., & Müller, E. (2011). Physical self-concept and

physical fitness in elderly individuals. Scandinavian Journal of Medicine & Science in

Sports, 21 Suppl 1, 83-90.

Glassman, G. (2002). What is fitness. CrossFit Journal, 1(3), 1-11.

Gliem, J. A., & Gliem, R. R. (2003, January 1). Calculating, Interpreting, And Reporting

Cronbach’s Alpha Reliability Coefficient For Likert-Type Scales. Retrieved December 1,

2019, from https://scholarworks.iupui.edu/handle/1805/344.

Keith, N., Stump, T., & Clark, D. (2012). Developing a Self-Reported Physical Fitness Survey.

Medicine And Science In Sports And Exercise, 44(7), 1388-1394.

Ozbulut, O., Genc, A., Bagcioglu, E., Coskun, K., Acar, T., Alkoc, O., . . . Ucok, K. (2013).

Evaluation of physical fitness parameters in patients with schizophrenia. Psychiatry

Research, 210(3), 806-811.

Thompson, P. (2015). Physical Fitness, Physical Activity, Exercise Training, and Atrial

Fibrillation: First the Good News, Then the Bad. Journal of the American College of

Cardiology, 66(9), 997-999.


FITNESS 12

Appendices

Perceived Fitness Data Collection

Table 1.

Mean, Standard Deviation, Item Total Correlations, and Item Reliability Index for final

Perceived Fitness scale Items (N = 60)

Item Mean Standard Item-Total Item Reliability

Deviation Correlations Index

(correlation if an
(reliability if an
item is dropped)
item is dropped)

Q_1 4.07 0.86 0.23 (0.20) 0.69 (0.085)

Q_2 4.08 0.59 0.47 (0.54) 0.66 (0.39)

Q_3 3.93 0.78 0.34 (0.33) 0.68 (0.21)

Q_4 3.18 1.23 0.58 (0.47) 0.65 (0.41)

Q_5 3.32 1.11 0.73 (0.74) 0.61 (0.61)

Q_6 3.35 1.15 0.76 (0.79) 0.60 (0.66)

Q_7 3.00 1.16 0.06 (0.01) 0.74 (0.14)


FITNESS 13

Q_8 3.85 1.30 0.43 (0.29) 0.69 (0.22)

Q_9 4.00 0.90 0.58 (054) 0.64 (0.46)

Q_10 2.95 1.20 0.44 (0.32) 0.68 (0.25)

Q_11 3.00 1.25 0.76 (0.74) 0.60 (0.64)

Note: All numbers are rounded to two decimal places. Cronbach’s coefficient alpha = 0.68
FITNESS 14

Table 2.

Scale for measuring perceived fitness among University of Guelph PSYC *3250 (01) F19

students

For each item, please circle the option that most accurately describes yourself:

1 2 3 4 5

Strongly Disagree Disagree Neither agree nor disagree Agree Strongly Agree

1. It takes me a long time to react when startled.

1 2 3 4 5

2. I react quickly to hazardous situations.

1 2 3 4 5

3. I have quick reflexes.

1 2 3 4 5

4. I struggle to open jars.

1 2 3 4 5

5. I can easily lift heavy objects.

1 2 3 4 5
FITNESS 15

6. I struggle to lift heavy objects.

1 2 3 4 5

7. I stretch often.

1 2 3 4 5

8. I can touch my toes.

1 2 3 4 5

9. I can maneuver my body easily.

1 2 3 4 5

10. I am a fast runner.

1 2 3 4 5

11. I can move heavy objects quickly.

1 2 3 4 5

Note: This was the finalized scale used for collecting data.
FITNESS 16

Table 3.

Correlation Matrix with confidence intervals

Variable 1 2 3 4 5 6 7 8 9 10

1. Q_1

.39
2. Q_2
**
[.15,
.59]

3. Q_3 .18 .49**


[-
[.27,
.08,
.66]
.42]

4. Q_4 .00 .17 .12


[-
[-.09, [-.14,
.25,
.40] .36]
.26]

5. Q_5 .05 .19 .04 .44**


[-
[-.07, [-.21, [.21,
.21,
.42] .30] .62]
.30]

6. Q_6 .13 .21 .12 .53** .91**


[-
[-.05, [-.14, [.32, [.85,
.13,
.44] .37] .69] .94]
.37]

7. Q_7 .08 .05 .04 -.23 -.05 -.16


[-
[-.21, [-.22, [-.45, [-.30, [-.40,
.17,
.30] .29] .03] .20] .09]
.33]

-
8. Q_8 .02 .13 .07 .07 .13 .12
.06
[- [-
[-.13, [-.19, [-.19, [-.13, [-.14,
.23, .31,
.37] .32] .32] .37] .36]
.28] .20]

-
9. Q_9 -.11 .25* .07 .18 .25 .29* .42**
.08
[- [-
[.00, [-.19, [-.07, [-.00, [.04, [.19,
.35, .33,
.48] .32] .42] .48] .51] .61]
.15] .18]
FITNESS 17

-
10. Q_10 -.08 -.04 .03 .20 .06 .11 .16 .47**
.06
[- [-
[-.29, [-.22, [-.05, [-.19, [-.15, [-.10, [.25,
.33, .31,
.21] .29] .43] .31] .36] .40] .65]
.18] .20]

-
11. Q_11 -.03 .28* .12 .46** .72** .72** .14 .38** .32*
.16
[- [-
[.02, [-.14, [.24, [.57, [.57, [-.12, [.14, [.07,
.28, .40,
.50] .37] .64] .82] .82] .38] .58] .53]
.22] .09]

Note. Values in square brackets indicate the 95% confidence interval for each correlation. The
confidence interval is a plausible range of population correlations that could have caused the
sample correlation (Cumming, 2014). * indicates p < .05. ** indicates p < .01.

You might also like