You are on page 1of 4

Math and Statistics II

Individual Assignment 1
Vito Andjelic 24. 3. 2022

62005147

You have collected the following data:

Age of male students:


21, 25, 17, 22, 45, 18, 20, 19, 22

Age of female students:


17, 35, 19, 18, 25, 33, 22, 21, 20

1) Which scale type would you assign these data sets to?

Ratio

2) Calculate the following values for male students and female students separately:

The average value (mean) or middle point of the female age is 23.33, and for male it is 23.22.
The middle value (median) if we arrange the ages from lowest to highest for both genders is 21.00
The most common value (mode) for the female gender for some reason did not give an output because
there is no most common value, therefore all values are the mode, for the male gender the value 22 is
most commonly occurring.
Standard deviation is the numerical value that explains how far the values are spread out, for the female
gender the values are 6.5 units (years) and for the male gender it is 8.51 units (years). Which means that
the male ages are further apart.

Open the dataset “assignment 1.sav” in PSPP.

It contains three variables (variable view), a nominal (e.g. gender), an ordinal (e.g. grades 1-5) and an
interval scaled (e.g. age) variable for 25 people (data view).

3) Evaluate the distribution of the interval and the ordinal scaled variables in the dataset making
use of

a. a histogram

The histogram on the left represents the grades in the class, from the histogram we can see that there is
no most common grade in the class, however the grade with the highest frequency is 3 and the lowest is
4.
The histogram on the right represents the ages in the dataset. The most common value is 19, however it
can be extracted from the histogram that there are more “young people” in the data than “old people”.

b. a Box(-and-whisker)-plot
The cat-and-whiskers (boxplot) diagram on the right represents the variable age and it is perfectly
symmetrical. This means that if we subtract the first quartile from the third quartile, we will obtain the
median.
The boxplot on the left is skewed to the right, meaning that most of the values in the data are on the left
side of the distribution.

c. skewness and kurtosis measures.

The kurtosis is the measure of how flat or sharp the peak of the distribution is. It ranges from values of
-3.0 to 3.0 and our data suggests that we have a much flatter peak of the data set.

The skewness is a measure of where more of the sample data is gathered the values of skewness ranges
from -0.8 to 0.8, for example we have a value of 0.15 skew for the variable grade, which means that we
have positive kurtosis with a tail pointing to the right.

4) Do your variables follow a normal distribution? How can you see? Please motivate your
answer.

Our variables have a relatively normal distribution, from the histogram it is visible that both of our
variables are slightly skewed to the right (values 0.15 and 0.2), and the kurtosis is closer to 0 than to 8
(or -8), which means that the peaks of the line will be a bit flatter than a normal distribution. These two
variables have very miniscule values for kurtosis and skewness; therefore, it would be fair to say that
they do follow a normal distribution but not completely due to skew and kurtosis.

5) Calculate the mean and the median for the interval and ordinal scaled variables and which one
would you prefer for which variable and why?
The median is the middle value, when all the values are arranged in order from lowest to highest. The
arithmetic mean is a measure of the sum of all values divided by the number of values. In terms of what
is more practical in statistical analysis I would say the mean, however the median is not affected by
outliers when the mean is.

For both of the variables, using the mean would allow us to have an accurate description of the data, and
since neither of the variables have extreme values, the mean and the median are close together.

Think about rolling a die. Assume you roll the die 6 times. For this thought experiment, write down an
expected outcome (even or odd) for each of the 6 rounds. Then really roll the die 6 times
(https://www.random.org/dice/), and write down, how many times out of the 6 trials you guessed the
true outcome correctly.

Expected outcome Real outcome


2 even 6 even
4 even 3 odd
1 odd 4 even
3 odd 4 even
5 odd 6 even
6 even 6 even

6) Calculate the probability of guessing as many correct results as you had.

Correct guess – P=2

Probability of success 0.5

Probability of guessing 2 out of 6 correct is 0.2344, which is 23.44%.

7) Calculate the probability of guessing 2 or less correctly.

Correct guess – P=1 + P=2

Probability of success

Probability of guessing 2 or less correct is 0.3438, which is 34.38%.

You might also like