You are on page 1of 8

Eeman Qureshi Lab 6

Lab 6 Notes &Assignment

Q1: First we are going to import an excel dataset. Provide the command to open the
‘student_scores’ dataset in your do-files.

Q2: Create a variable ‘GRE_group’ which should store 1 if the student scored below 300, 2 if the
student scored between 300 and 320 inclusive and 3 if the student scored above 320. Code the
variable such that it should show the following:
1-“Below Average”
2-“Above Average”
3-“Excellent”
Tabulate the variable to see if it’s been labelled/coded correctly.

Q3: How do we get the Summary Statistics for SAT ACT and GRE scores? Provide the command
in your do-files.

Question: If you want to give a student admission based on SAT or ACT scores. One candidate X
scored 1300 on SAT while the other candidate Y scored 954 on the ACT. Which candidate will you
choose?
A: We cannot choose based on these scores because each test has a different total score.
Standardization formula i.e. the formula of the Z score.
We need to standardize the variables by applying the following formula of Z score:
Eeman Qureshi Lab 6

Q4) After variables have been standardized, can you interpret the standardized values of the SAT,
GRE, and ACT scores? Provide the interpretation of the scores as a comment in your do-files.
*SAT Standardized: -0.8144
*GRE Standardized: +1.22
*ACT Standardized: -0.4578

What problems can arise if we type in data manually?


*Manually entering this data could lead to problems in terms of errors while entering or you can replicate
the same command with the same data set for different numbers.

To automate, we can use data storage type: local & macro together
We can call a local by typing return list and a macro using the sign `' (button towards the left of digit one
on your windows keyboard or next to Z on Macbook). We must always perform summary statistics for
the specified variable before calling a local. The return list command shows the information that is stored
as a local.
Type sum ACT
sum ACT
Eeman Qureshi Lab 6

After the command sum, type return list to see the info stored as locals and then type the following
command:

di (954-`r(mean)')/`r(sd)'

NOTE: You must always type the sum command before calling a local.

Q5) If you want to give a student admission based on SAT or ACT scores. One candidate X scored
1200 on the SAT while the other candidate Y scored 850 on the ACT. Which candidate will you
choose? Provide the entire working and answer as a comment in your do-files.

Generating Variables using Locals:


We can also generate a variable using locals by using the gen command in combination with a local
which shows standardized ACT scores for each student present in the data set.
Use the following commands:
sum ACT
gen ACT_standardized= (ACT-`r(mean)')/`r(sd)'
This command will generate a variable that performs the same calculation for each ACT score, it will
generate a standardized score for each observation present under the ACT scores.

Q6) Generate a variable which stores the standardized score for SAT scores.
Eeman Qureshi Lab 6

Normal Distribution:
What is the Normal Distribution?

Normal distributions have key characteristics that are easy to spot in graphs: The mean, median
and mode are exactly the same. The distribution is symmetric about the mean—half the values
fall below the mean and half above the mean. The distribution can be described by two values:
the mean and the standard deviation.

In a normal distribution curve the scores are equal, they are equal on both sides of the mean. It
generates a bell-shaped curve which is symmetrical around the middle value.
Eeman Qureshi Lab 6

Q8) How to check if ACT_standardized is normally distributed through graphs?


command: histogram ACT_standardized, freq normal

● We check for three things when analyzing statistical data:


1) Normal or not
2) Skewness
3) Kurtosis

Pearson Skewness Formula:

X bar is the mean and Mnot is the mode and s is the std. dev

Skewness
Eeman Qureshi Lab 6

Three possibilities of the Pearson Skewness Test:


● Negative Direction, skewed to the left where Mean< Mode so you get a negative value on the
Pearson Skewness test.
● Positive Direction skewed to the right where Mean>Mode so you get a positive value on the
Pearson Skewness test
● If the Skewness value is zero then it is symmetrical

Which command can we use to see skewness in data?


sum ACT, detail
Or
sum ACT, d
We can check this through graph form as well by typing the following command:
histogram ACT, freq normal

Kurtosis:
Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal
distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low
kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be the extreme case.
● We can also check Kurtosis using the sum ACT, detail command
● The Kurtosis value of Normal distribution is 3

We can calculate Kurtosis using local in the following way:


sum ACT, d
Kurtosis: 1.912563
di `r(kurtosis)'-3
● A positive value tells you that you have heavy-tails (i.e. a lot of data in your tails). i.e.
Leptokurtic
● A negative value means that you have light-tails (i.e. little data in your tails). i.e. Platykurtic
Eeman Qureshi Lab 6

Platykurtic

Platykurtic distributions have negative kurtosis. The tails are very thin compared to the normal
distribution, or — as in the case of the uniform distribution— non-existent. This distribution has
less outliers as compared to Normal distribution and the distribution is less than 3.

The Leptokurtic T-Test


Leptokurtic distributions have more outliers than the normal distributions and the distribution is greater
than 3. It has heavier tails than the normal distribution. One example of a leptokurtic distribution is the t-
test.

Q9: What does a Kurtosis of 1.91 mean? Choose from the two options below and write the answer
as a comment in your do-files
Leptokurtic
Platykurtic

Q10: What does Kurtosis of 2.98 mean? Choose from the two options below and write the answer
as a comment in your do-files.
Leptokurtic
Platykurtic
Eeman Qureshi Lab 6

Q11) What does Kurtosis of 3.5 mean? Choose from the two options below and write the answer as
a comment in your do-files.
Leptokurtic
Platykurtic

Q12) What does Kurtosis of 4.5 mean? Choose from the two options below and write the answer as
a comment in your do-files.
Leptokurtic
Platykurtic

Q13) Open the Dataset Lab 5, using the histogram and sum detail command. Comment on the
skewness of the following variables.
Variable Positively Negatively Zero Skewed Leptokurtic Platykurtic
Name Skewed Skewed
A
B
X= 7A+4B
Y=5C-3D
L
S
K

You might also like