You are on page 1of 3

20211022 Class Activity 2

Name: 潘氏春瓊 (Quinn)


Student ID: 11096020

Step 1: Summary reason:


Because of the convenience and modern technologies in cities, students who live in big
cities find a vast variety of opportunities to broaden knowledge than students who lived
in suburbs or in countries. My motivation to collect the data is: to check whether the
place students lived affect the rank of them or not. I listed the sample below:

I collected data from 100 students who took the university entrance exam in 2016. This
exam in Vietnam has the same exam questions across the country.
SCORE is the score of students. It is calculated by combining scores of 3 subjects and 10
is the highest score of each subject.
There are 3 largest cities (Ho Chi Minh city, Hanoi city and Da Nang city) among 63
provinces in Vietnam. The students who lived and studied mainly in these 3 cities before
the exam were reported “1” in the CITY column. And the remaining students were “0”.
RANK is the order of the students, ranked ascending.
Dependent variable is: RANK, independent variables are: SCORE and CITY.
Step 2: Calculate the summary statistic:
The mean value of SCORE is 21,87. The minimum and maximum of the SCORE valued
17,50 and 24,80 respectively.
Step 3: Run the regression (run one x variable with y)

Step 4: Estimate the coefficient:


^
RANK = 419,379 - 16,9748SCOREi
It shows that if the score increase 1 mark, the rank of student will decrease 16,9748 unit.
It appears a negative relationship between SCORE and RANK because the higher the
SCORE, the smaller the number in RANK column. The higher the student’s ranking is,
the smaller a number indicates.
Step 5: Precise the meaning of the R-squared:
The R-squared of the model is 0,947601 which means almost observations (94,7601%)
can be explained by the model’s inputs (SCORE) and 5,2399% can be explained by other
variables outside the model and random errors.
Step 6: Estimate the second independent variable (run two x variables with y)

Step 7: Estimate the coefficients:


^
RANK = 419,231 – 17,0021SCOREi + 2,26313CITYi
The result shows that if the SCORE increase by 1 unit, the student’s ranking will
decrease 17,0021 unit. The relationship between RANK and SCORE is negative because
the higher the total score of a student, the smaller the number in RANK column of this
student. The coefficient of CITY variable is 2,26313, it means that the rank of students
who lived in the city is 2,26313 higher than students who didn’t live in the city. It shows
the different between the RANK of students who lived and didn’t live in the city.
The coefficients after adding CITY to the model shows that it may be accurate because
the formula results are in different integer, and it has no zero value. However, the
coefficient of variation (C.V) of CITY is higher than the mean value of it, which
represents that the regression model is more risk than reward if using CITY to explain
RANK.
The C.V of SCORE is the lowest which illustrates that it should be depended more than
CITY in explaining the order of students’ rank because it offers the most optimal risk-to-
reward ratio.
Step 8: Compare two results and precise the meaning of the adjusted R-squared:
The new R-squared accounted for 0,948905, which means the SCORE variable can
explain 94,8905% the RANK variable, the other 5,1095% can be explained by other
variables outside the model and the random errors. The value is under 1 which indicates
the model is reliable and new variable (CITY) just added more values into the regression
model.
Adding optional dependent variable to the model make the adjusted R-squared
descreased by 0,001053 compared with the R-squared. It means the new variable is not
helpful in explaining the order of students’ rank by the city students lived.

You might also like