You are on page 1of 36

1

10190598_SFM_A1.1

WORDS COUNT: 6552

Contents
I. INTRODUCTION..........................................................................................................................3
II. MAJOR FINDINGS......................................................................................................................4
Part A. Data Sources & Method Collection.............................................................................4
1. Data Sources in Business & Economics............................................................................4
2. Data collection method in Business and Economics......................................................4
3. Methods of data analysis........................................................................................................6
Part B..........................................................................................................................................7
1. Evaluate each variable............................................................................................................7
2. Relationship between variables..........................................................................................23
3. Evaluation on the results of summary statistics............................................................27
4. Methods of communication.................................................................................................28
5. Utilization of different types of charts...............................................................................29
Part C. Analyse and evaluate business data........................................................................31
1. T-test to compare variables.................................................................................................31
2. Z-test to compare variables.................................................................................................32
3. Regression Model...................................................................................................................33
4. Comparison of Summary statistics in Part B & Hypothesis Testing in Part C........35
5. Comparison using Regression Analysis in Part C and correlation coefficients in
Part B.................................................................................................................................................35
III. CONCLUSION..............................................................................................................................37
IV. REFERENCES.............................................................................................................................37

2
I. INTRODUCTION
This coursework paper will read and analyze data for 158 observations from a total of a web
survey of pupils in a state of the US statistics table. In addition to simple data description, the
article also demonstrates relationships between different variables. There are evaluations of
communication and methods used to describe and infer data.

3
II. MAJOR FINDINGS
Part A. Data Sources & Method Collection
1. Data Sources in Business & Economics
Primary data source is a data source that can directly represent research’s topic or purpose and
is formed from interaction of researcher and research object through surveys, interviews,
questionnaires, observations, experiments, etc (Kabir, 2016).

Secondary data is a source of information that has been extracted from research result of
collecting, using, and evaluating of other researchers.

2. Data collection method in Business and Economics


Primary data collection method can be classified into two categories (Kabir, 2016):

Qualitative Method relating to elements that cannot be quantified or evaluated on an arithmetic


scale, understood as words, sounds, sensations, emotions, colors, etc.

 Open-Ended Surveys: Respondents can freely and flexibly provide numerous responses.

 Interview: It can be one-on-one or focus group depending on purpose of the study and
accessibility of researcher. As the researcher asks the participants questions and records
responses
 Direct observation: is a passive method of data collection, researchers observe the
context, in which participants behave and variables cannot be regulated.
Quantitative Method are provided as numbers and can be deduced by mathematics. Random
sampling and structured data collecting tools are used in this procedure to fit different
experiences into preset answer categories. It can easily compare and evaluate magnitude of
different variables, especially in studies with many variables.

 Closed-ended Surveys: Questionnaire with list of pre-programmed questions and


answers and divided into two categories: dichotomous and Likert scale
 Experiments: Researchers attempt to comprehend cause-effect relationship by observing
variables' impact on others, and they also have abilities to modify some of them.
 Obtaining pertinent data from management information systems.
Secondary data collection method can take into account both quantitative and qualitative
methods. It is gathered from existing and accessible sources within and outside as Business
Journal, previous studies, published works, books, etc.

4
Table 1 Advantages & Drawbacks of 2 types of sources (Kwantlen Polytechnic University, 2020)

PRIMARY DATA SECONDARY DATA

 Closely related to topic of research and meet  Economical when many quality resources are
the researcher’s demands published and easily accessible
 More up-to-date and opportune  Short time to make effective use of resources
 High level of reliability since this source is in because they have been collected and
Pros raw state and unaltered by people. analysed
 Larger sample size for many studies combines
resources of government or organization
allowing it to reach up to thousands or millions
of people.
 Expensive cost to implement (including  Relevant data may be outdated or non-existent
preparation phase) when reaching many  Unspecific answer for question researchers
individuals, and level of expenditure depends needs
on main approach and geographic scope,  Lower accuracy and reliability sometimes
printing costs (if not based on online), costs for because not all sources are reputable, or due
looking participants. to personal reasons, some information would
Con
 It's time-consuming, depending on needed be exaggerated.
s
sample size, consuming more time if using a
mainstream approach like interviews. It also
requires cleaning and putting it in a database
 Degree of trust is uncertain in some cases,
sometimes it shows a person's thoughts and
beliefs but not their behaviours.

5
3. Methods of data analysis
Table 2 Descriptive statistics and Inferential statistics (HILLIER, 2021)

DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS

 Branch of statistics describe of whole  Generalizing specific facts of entire


CHARAC- population under investigation. population by evaluating one sample of it.
TERISTIC  Method of organize, examine &  Compare, draw conclusions, predict data
S visually displaying data  Ultimate Result is particular occurrence’s
 Ultimate Result is graphs, chart, table probability
 Central Tendency: Identifying  Hypothesis Testing: to verify which one in
centres of a frequency distribution for two exclusive statements concerning a
data set by specifying mean (interval/ population's attributes is correct
ratio variables), median (ordinal  Confidence Intervals: is a set of values
variables or interval/ ratio variables of limited above and below mean of statistic to
skewed distribution), mode (nominal ascertain a sampling method's extent of
TOOLS
variables). uncertainty or certainty.
 Dispersion: indicating how dispersed  Regression analysis: To measure
data are by range, standard deviation, strength of a dependent variable's
variance, etc. association with one or several
 Skewed: Providing data in a independent variables, as well as to model
distributed state (symmetrical, left-tail, their future relationship.
right-tail)
 Data obtained can be both qualitative  It is unbiased because of its high structure
& quantitative, it gives a
PROS
comprehensive perspective of a study
issue.
 Due to lack of statistical testing,  As requiring population based on collected
research results contain some level of sample findings, accuracy is lower.

Cons bias.
 Research is limited, for no explaining
cause and effect of a research topic.

6
Part B.
1. Evaluate each variable
a. For qualitative variables
Nominal

 Q1: University

University
Frequency Percent Valid Percent Cumulative Percent
Valid Colorado 97 61.4 61.4 61.4
Oakland 61 38.6 38.6 100.0
Total 158 100.0 100.0

Out of a total of 158 survey participants, 61.4%


participants (97 students) were students at Colorado,
while the number of students participating Web
Survey from Oakland was almost twice lower with only
38.6%.

 Q2: Accommodation

Accommodation
Frequency Percent Valid Percent Cumulative Percent
Valid Dorm 79 50.0 50.0 50.0
Other 2 1.3 1.3 51.3

Parents 43 27.2 27.2 78.5


Share Apt 24 15.2 15.2 93.7
Solo Apt 10 6.3 6.3 100.0

Total 158 100.0 100.0

Dorm is the most popular


option for students, with up
to a half of those polled

7
agreeing (50%). Students' accommodation options are Parents, Shared Apartment, and Solo
Apartment respectively (27.2%, 15.2%, and 6.3%). Other places of residence are insignificant
with only 1.3%.

 Q3: CellPhone

CellPhone
Frequency Percent Valid Percent Cumulative Percent
Valid Alltel 4 2.5 2.5 2.5
Cingular 47 29.7 29.7 32.3
Nextel 5 3.2 3.2 35.4
Other 7 4.4 4.4 39.9
Sprint 14 8.9 8.9 48.7
T-Mobile 21 13.3 13.3 62.0
Verizon 59 37.3 37.3 99.4
Virgin 1 .6 .6 100.0
Total 158 100.0 100.0

Overall, Verizon leads the student phone market with 37.7%, followed by Cingular with
29.7%. Following that, the most popular phone lines are T-Mobile, Spirit, Other, Nextel,
and Altel respectively. Virgin has the lowest use percentage of being used by
participants at 0.6%.

8
 Q5: Estimated Minutes

Estimated minutes
Frequency Percent Valid Percent Cumulative Percent
Valid 0 17 10.8 10.8 10.8
1 141 89.2 89.2 100.0
Total 158 100.0 100.0

There are 89.2% of students, which means a very large


proportion used an estimate for the 4th question, only
10.8% replied in an exact number.

 Q6: Credit Card

9
CreditCard
Frequency Percent Valid Percent Cumulative Percent
Valid Amex 7 4.4 4.4 4.4
Discover 3 1.9 1.9 6.3
Mastercard 28 17.7 17.7 24.1
None 3 1.9 1.9 25.9
Other 1 .6 .6 26.6
Visa 116 73.4 73.4 100.0
Total 158 100.0 100.0

Visa dominates the ranking of credit cards used by students with percentage approximately ¾
of the total answers (73.4%). Although percentage is approximately 4 times lower than the 1st
position, Mastercard usage greatly outweigh others as Amex, Discover, and other (17.7%
compared to 4.4%, 1.9%, 0.6% respectively). There is up to 1.9% of participants said that they
do not use credit cards

 Q8: Estimated Balance

Estimated balance

Frequency Percent Valid Percent Cumulative Percent


Valid 0 1 .6 .6 .6
1 66 41.8 41.8 42.4
2 91 57.6 57.6 100.0
Total 158 100.0 100.0

The survey question of this variable is


"Did you enter an exact number above, or
an estimate?", with 1 is exact number, 2
is estimate. The data table shows, there
is only one value of 0, maybe that value is
a missing value due to wrong data input.
Up to 57.6% of students entered estimate
in the previous question, the rest entered
the exact number

10
 Q19: Textbook

Textbook

Frequency Percent Valid Percent Cumulative Percent


Valid Campus store 101 63.9 63.9 63.9
No book 3 1.9 1.9 65.8
On-line 28 17.7 17.7 83.5
Other retail 18 11.4 11.4 94.9
Other student 8 5.1 5.1 100.0
Total 158 100.0 100.0

Most of students said that


they bought statistic textbook
at Campus Store with 63.9%.
Other possible ways to buy
textbooks are Online taking
second place with 17.7%,
11.4% of respondents buying
from Other Retail, 5.1% of
respondents buying from
other students. Only 1.9%
said they don't have books.

 Q20: Using Electronic Textbook

Using Electronic Textbook

Frequency Percent Valid Percent Cumulative Percent


Valid No 121 76.6 76.6 76.6
Yes 37 23.4 23.4 100.0
Total 158 100.0 100.0

Approximately three quarter of


participating students (76.6%) has

11
never used electronic textbook as primary source in their college courses before, while only
23.4% students have used.

 Q23: Operating System

Operating System

Frequency Percent Valid Percent Cumulative Percent


Valid Mac 23 14.6 14.6 14.6
Other 1 .6 .6 15.2
Windows 134 84.8 84.8 100.0
Total 158 100.0 100.0

Computer Operating systems preferred


most by respondents is Windows with up
to 84.8%. Preference of using Mac is not
highly appreciated as Windows, due to
the minority of Mac choice from
respondents (14.6%).

 Q24: Using Linux

Using Linux
Frequency Percent Valid Percent Cumulative Percent
Valid No 84 53.2 53.2 53.2
What's Linux? 52 32.9 32.9 86.1
Yes 22 13.9 13.9 100.0
Total 158 100.0 100.0

12
A majority of students
(53.2%) have never used
Linux, while nearly one-
third participants (32.9%)
have never known what
Linux is. There is only
13.9% said “Yes” as
being put a question
about having used Linux
or not.

 Q25: Using iPod or MP3


Using iPod or MP3

Frequency Percent Valid Percent Cumulative Percent


Valid Both 14 8.9 8.9 8.9
iPod only 97 61.4 61.4 70.3
MP3 only 22 13.9 13.9 84.2
Neither 25 15.8 15.8 100.0
Total 158 100.0 100.0

The number of people using


Ipod only is much higher
than the number of people
using MP3 only (97 people
and 22 people). The
proportion of the sample
who do not use any device
is higher than MP3-only
users, despite insignificantly
difference (only 1.9%). The
lowest proportion of people
using both devices is 8.9%.

 Q26: Game Cube

13
Game Cube
Frequency Percent Valid Percent Cumulative Percent
Valid No 140 88.6 88.6 88.6
Yes 18 11.4 11.4 100.0
Total 158 100.0 100.0

A very large proportion of


respondents do not own Game
Cube, this proportion is almost 9
times higher than proportion
students owning Game cube (88.6%
compared to 11.4%)

 Q27: PS2

PS2
Frequency Percent Valid Percent Cumulative Percent
Valid No 97 61.4 61.4 61.4
Yes 61 38.6 38.6 100.0
Total 158 100.0 100.0

A hefty 60% of participants does not own any


PS2 (61.4%), whereas the rest of sample do
own any PS2 (38.6%).

 Q28: PS3

PS3
Frequency Percent Valid Percent Cumulative Percent

14
Valid No 152 96.2 96.2 96.2
Yes 6 3.8 3.8 100.0
Total 158 100.0 100.0

Almost all students do not possess PS3


(96.2%), otherwise only 6 people out of 158
participants, i.e., 3.8%, said they own a PS3.

 Q29: XBOX or XBOX 360

Xbox or Xbox 360

Frequency Percent Valid Percent Cumulative Percent


Valid XBOX 123 77.8 77.8 77.8
XBOX360 35 22.2 22.2 100.0
Total 158 100.0 100.0

Slightly more than three-quarter


participants tend to own XBOX (77.8%).
Meanwhile, the number of participants
who own the XBOX360 version is lower
than and only accounted for 22.2%.

 Q30: Nintendo Wii

Nintendo Wii
Frequency Percent Valid Percent Cumulative Percent

15
Valid No 145 91.8 91.8 91.8
Yes 13 8.2 8.2 100.0
Total 158 100.0 100.0

The overwhelming majority of students


do not possess Nintendo Wii (91.8%),
this percentage is higher about 11 times
than students possessing Nintendo Wii
(8.2%).

 Q31: Other Game system

Other Game system


Frequency Percent Valid Percent Cumulative Percent
Valid No 114 72.2 72.2 72.2
Yes 44 27.8 27.8 100.0
Total 158 100.0 100.0

In addition to game like variables above,


only roughly one quarter respondents
used a different game system (27.8%),
the rest don't use other game system.

16
 Q17: Language skill

Language Skill
Frequency Percent Valid Percent Cumulative Percent
Valid Fluent 25 15.8 15.8 15.8
Moderate 46 29.1 29.1 44.9
None 22 13.9 13.9 58.9
Slight 65 41.1 41.1 100.0
Total 158 100.0 100.0

It can be assessed that “Slight” is the most common level that students are currently at 41.1%.
Following that, "Moderate” level is the second most common language skill level with a rate of
29.1%. The number of participants with “Fluent” language level are much lower and only a half
that of the students with the “Moderate” level. And this rate is approximately equal to the
percentage of students with "None" in the sample (15.8% compared to 13.9%)

 Q18: Frequency of Reading

Frequency of Reading

Frequency Percent Valid Percent Cumulative Percent

17
Valid Never 26 16.5 16.5 16.5
Occasionally 110 69.6 69.6 86.1
Regularly 22 13.9 13.9 100.0
Total 158 100.0 100.0

Students do not seem to have


a habit of reading regularly
when “Regularly” is the lowest
rate with only 13.9%. Up to
69.6% of students tend to read
books "Occasionally" and this
percentage is the highest of
three frequency levels
proposed. The remaining have
no interest or time for reading
with only 16.5% proportions for
"None".

 Q22: PC Access

PC Access

Frequency Percent Valid Percent Cumulative Percent


Valid Always 104 65.8 65.8 65.8
Never 32 20.3 20.3 86.1
Often 13 8.2 8.2 94.3
Rarely 9 5.7 5.7 100.0
Total 158 100.0 100.0

Up to nearly two-thirds of
students “Always” connect
to a laptop PC to bring to
class, meaning that
always having access to
PC laptop accounts for the
highest percentage.
Number of people who

18
tend to “Never” connect to a laptop PC is 3 times less than “Always”, but it seems to be over 2
times higher than “Often”. Only a few students with a 5.7% sample go to school but "Rarely" to
access PC

b. For quantitative variables

Descriptive Statistics
N Minimum Maximum Mean Std. Deviation Skewness
Statistic Statistic Statistic Statistic Statistic Statistic Std. Error
Time using 158 1 15000 730.22 1299.292 8.768 .193
Cellphone Minutes

Balance 158 .00 12000.00 706.2920 1507.63976 4.980 .193


GPA 158 1.600 4.000 3.17480 .490077 -.503 .193
Working Hours 158 .0 60.0 12.193 14.0423 .880 .193
Ideal Child 158 0 8 2.46 .842 2.521 .193
Parents Child 158 1 10 2.65 1.123 2.128 .193
Car Age 158 .0 2003.0 120.520 463.5347 3.858 .193
Job Market 158 1.000 7.000 5.57278 .973221 -1.535 .193
Politics 158 1.000 6.900 3.93692 1.445987 -.268 .193
Religious 158 0 100 11.36 17.091 2.102 .193
Prefer Elecronic 158 1.000 7.000 3.45084 1.725147 .427 .193
Valid N (listwise) 158

 Q4: Time using cell phone minutes


15000 minutes per month is a maximum time of students on their cellphone, and 1 minute is the
minimum time of students on the same thing. The average time a student spends using
Cellphone is 730.22 minutes per month. Standard deviation was 1299,292, meaning that cell
phone use fluctuated widely among mean. A large skewness value of 8,768 (Positive and higher
than 1.3) means that data for this variable is highly skewed and right long tail. This indicates that
majority of students have an extremely shorter usage time than mean.

 Q7: Balance
Degree of variation is extremely large in this variable as standard deviation is 1507,63976 and is
more than double mean balance of students of 706.2920. The maximum balance of a student
can be up to 12000 and the lowest balance is 0.00. Skewness is 4.980 (positive and higher than
1.3), which means data for this variable is highly skewed and has right-skewed distribution.

19
Demonstrate that most students of both 2 universities have balances that are much lower than
mean amount.

 Q9: GPA
The highest GPA of students of 2 universities is 4.0 and the lowest GPA is 1.6. The average
GPA of students recorded is 3.1748. Variable’s data has skewness at -0.503 (negative and
lower than -0.3), so called moderately skewed and having left short tail, which means many
students have GPA higher than 3.1748.
 Q10: Working Hours
The maximum number of working hours per week is 60 hours, and the minimum is 0 hours. On
average, a student spends 12,193 hours in a week doing a paid job. Variable’s skewness is 0.88
(positive and higher than 0.3), so its data is moderately skewed and has right short tail. It takes
less time than average time for some students to do paid jobs.

 Q11: Ideal Child


The number of children a couple should have that are suggested by students is a maximum of 8
children and a minimum of 0 children. On average, ideal child for a married couple is 2.46
children. Skewness is calculated by 2.521 (positive and higher than 1.3), so data is highly
skewed and has a right long tail.

 Q12: Parents Child


Family with the largest number of children are up to 10 children, on the contrary, family with the
least number of children has only 1 child. On average, each student's family has 2.65 children.
Skewness is calculated by 2.128 (positive and higher than 1.3), so data is highly skewed and
right-skewed distribution.

 Q13: Car Age


Regardless of data in Descriptive Statistics table, there are a few metrics for wrong data input,
that are outliers and can distort normalization or accuracy of this variable's data. For the same
reason, the average age value of each student's car of 120.52 is too high for most of values to be
obtained. Skewness is calculated by 3.858 (positive and higher than 1.3), so data is highly
skewed and has a right long tail.

 Q14: Job Market


On a scale of 1 to 7, the greatest score when students self-assess current state of job market for
their desired major is 7.0, and the lowest score is 1.0. Job market potential of student's intended

20
occupation is highly appreciated with a mean value of 5,57278 and moderately high standard
deviation of 0.973221. Skewness is -1.535 (negative and lower than 1.3), which means data for
this variable is highly skewed and has left long tail. Many students believe that their intended
job's job market is potential, most of the assessment scores are higher than the average score.

 Q15: Politics
Political orientation as students' self-assessment has the maximum value of 6.9 and the
minimum value of 1.0 on a scale of 1 to 7. On average, each student's self-rated political
orientation is 3,93692. Variable’s data has skewness at -0.268 (negative and higher than -0.3),
so called fairly symmetrical or little left skewed.
 Q16: Religious
Each student attended an average of 11,36 religious services over the past year. Skewness is
calculated by 2.102 (positive and higher than 1.3), so data is highly skewed and has a right long
tail.

 Q21: Prefer Electronic


Assessing students' preference for an electronic version of the textbook for $75 over the paper
version for $125 on a scale of 1 to 7, some students prefer electronic textbooks with maximum
score of 7, someone prefer paper textbook with minimum score of 1. The low mean on a scale of
1 to 7 is 3.45084, besides the standard deviation is 1.725147, which means they are not more
interested in electronic version of the textbook than paper textbook. Skewness is calculated by
0.427 (positive and higher than 0.3), so data skewness has right short tail and is not significant.

21
2. Relationship between variables
a. For a couple of qualitative variables – Q1 & Q2, Q17 & Q18
Q1 and Q2: Relationship between Students’ University and Accommodation

University
Colorado Oakland
Count Column N % Table N % Count Column N % Table N %
Accommodation Dorm 76 78.4% 48.1% 3 4.9% 1.9%
Other 2 2.1% 1.3% 0 0.0% 0.0%
Parents 2 2.1% 1.3% 41 67.2% 25.9%
Share Apt 14 14.4% 8.9% 10 16.4% 6.3%
Solo Apt 3 3.1% 1.9% 7 11.5% 4.4%

As graph shows that, without any relationship between these two variables, Colorado
students tend to stay in “Dormitory” the most (about 78.4% of Colorado students), while
there are only 3 Oakland students on a total of 79 students are in Dorm. And in Oakland,
most students are living with their “Parents” (about 67.2% of Oakland students), the rate
of staying with parents in Colorado students is extremely low (2.1%). “Share Apartment”

22
is the place with the second-highest number of students from both schools. “Solo
Apartment” is the least popular place because it contributes only about 6.3% of total
choice of students of 2 universities, in which Colorado & Oakland students is 1.9% and
4.4% respectively. Obviously, cross table and clustered bar chart show, there is no
relationship between 2 variables University and students' Accommodation.

Q17 and Q18: Relationship between Students’ Language Skill and Frequency of Reading

Language Skill
Fluent Moderate None Slight
Column Table N Column Table Column Table Column Table
Count N% % Count N% N% Count N% N% Count N% N%
Frequency Never 5 20.0% 3.2% 6 13.0% 3.8% 4 18.2% 2.5% 11 16.9% 7.0%
of Reading Occasionally 19 76.0% 12.0% 33 71.7% 20.9% 13 59.1% 8.2% 45 69.2% 28.5%
Regularly 1 4.0% 0.6% 7 15.2% 4.4% 5 22.7% 3.2% 9 13.8% 5.7%

With all participants at any language level, they are most inclined to read books
"Occasionally". From level of "None" and above, students who have reading behavior

23
"Occasionally" gradually high by 59.1%, 69.2%, 71.7%, 76.0%, respectively. Those
considered to have “None” language skills have the highest frequency of “Regularly” at
22.7%, and frequency of reading “Regularly” low gradually across higher language skill
levels. Especially students with "Fluent" do not read "Regularly" as students with other
levels when only 4% of them. Behavior of “Never” reading, even more, proves no
correlation between two variables when "Fluent" students have the highest percentage of
"Never" reading, and this percentage for “Moderate” students at 13.0%, “Slight” students
at 16.9%. Clearly, students' level of language skills was not related to frequency of
reading.

b. For a couple of quantitative variables – Q4 & Q9, Q9 & Q10


Given that all correlations are significant at 0.05 level, and it put that Null Hypothesis - H0:
correlation coefficient is 0. Hence if Sig. is less than 5%, we can conclude that the two variables
are correlated.

Q4 and Q9: Relationship between Students’ GPA and Time Using Cellphone

24
Correlations
Cell Minutes GPA
Cell Minutes Pearson Correlation 1 -.145
Sig. (2-tailed) .070

N 158 158
GPA Pearson Correlation -.145 1
Sig. (2-tailed) .070

N 158 158

With the correlation coefficient R=-0.145, the relationship between the variable “GPA” and “Time
using cellphone” is evaluated as there is no close correlation, meaning “Negligible Correlation”.
And Significance (p=0.07) is higher than 0.05 so it is also irrelevant – “Not Correlate”. It is easy
to assess, whether students achieve high or low scores, the level of phone use only ranges from
0 to 2000 minutes. Sig

Q9 and Q10: Relationship between Students’ GPA and Time Working at a paid job

Correlations

25
GPA Working Hours
GPA Pearson Correlation 1 .002
Sig. (2-tailed) .985

N 158 158
Working Hours Pearson Correlation .002 1
Sig. (2-tailed) .985

N 158 158

With correlation coefficient R = 0.002, two variables "GPA" and "Working hours" are also
no correlation here - “Negligible Correlation”. Besides, significant coefficient p=0.985
is higher than 0.05, so there is no close correlation between the two variables. These
points in the scatterplot are randomly distributed with no discernible trends, so students'
GPAs do not vary positively or negatively with their hours worked.
3. Evaluation on the results of summary statistics
GPA

Mean Mode Standard Deviation


Language Skill Fluent 3.233 3.000 .499

Moderate 3.283 3.300 .450


a
Slight 3.091 2.800 .509

None 3.130 3.600 .483

a. Multiple modes exist. The smallest value is shown

According to the table of data processed above, we can see that: For students
possessing "Fluent" language skills, the average score is Mean=3.233 with a moderate
standard deviation SD=0.499 from the rest. And the most popular score of those who
have "Fluent" Language Skills is only Mode= 3.0. Individuals with "Moderate" language
skills have the highest GPA in whole observations with Mean=3.823 and the lowest
standard deviation SD=0.45, meaning the dispersion of pulse scores around
Mean=3.283 is not much. Considering students who have "Slight" language skills,
achieved the lowest GPA with Mean= 3,091, along with the extent of scores occurs most
is Mode=2.8. The rather high standard deviation SD=0.509 than others. Finally, those
who had no concept of the language - "None", have moderate GPA, mean= 3.130, and
26
were not too much lower from GPA of those with better language skills, and the standard
deviation is also relatively low SD= 0.483. Among them, the score of Mode= 3.6
appeared the most.

Working Hours
Mean Mode Standard Deviation
University Colorado 5.3 .0 9.4
a
Oakland 23.2 25.0 13.2
a. Multiple modes exist. The smallest value is shown

Colorado's students seem to be quite ignoring paid jobs, the time they spend in those
jobs with average hours (Mean=5.3) is 4 times lower than the average working hours of
Oakland's students (Mean=23.2). And looking at Mode, there are many Oakland
students who spend up to 25 hours a week (Mode=25) on their part-time jobs, whereas
most Colorado students are not interested in part-time jobs (Mode=0). The Standard
Deviation of the number of hours worked for Colorado students is also lower than that of
Oakland (9.4 vs. 13.2), which means. The tendency of students to spend time working at
Colorado schools is less volatile.

4. Methods of communication
Descriptive Statistics: It is a basic tool that allows researchers to readily interpret sample data
by: (1) Measures of location (Mean, Median, etc.) to determine where data is centred or where a
trend exists, (2) Measures of Variablity (skewed, interquartile range) to determine the spread or
diversity of a particular data collection, (3) Measure of Frequency (Kushwaha, 2020).

 Frequency Distribution: Using a frequency table to synthesize information contained in


a categorical variable, it is helpful to count observations for each qualitative variable's
category, hence helping determine and analyze its distribution. Moreover, this type of
table articulates how data with comparable characteristics might be distributed, its
quantity, proportions, greater or lower than data in different characters along with same
variable.
 Summary statistic: is used to evaluate quantitative variable classified by another
variable. It also concludes some measures of location as mean, mode, standard deviation

27
 Characteristic measures are leveraged most in this work: (1) Sample mean is an
excellent way to form a conclusion about a highly correct population mean since it is an
unbiased estimator of the population mean. (2) When mean becomes a poor measure for
outliers, Median is a superior measure within being little impacts. (3) Standard Deviation
signifies a more heterogeneous or different distribution of raw data on a scale. (4)
Skewness is useful to determine distribution is symmetric or not, and data's one side has
a long or short tail.
Pearson’s Correlation Coefficient: In addition to analyzing linear relationship between two
variables and how or to what degree of attachment between them is (Strong or Weak), which
means describing movement of one variable in relation to another (FERNANDO, 2021).

5. Utilization of different types of charts


Frequency table: Utilization Frequency tables in analysis distribution of data in variables is a
range of steps of determining or comparing the number of times a value occurs in the same
qualitative variable, which means data distribution in nominal and ordinal variables as university,
accommodation, credit card, etc. It is especially helpful in summarizing data sets with large scale
(as in this report) in succinct way or calculating probabilities (Reid, 2018). However, to examine
connection between one qualitative variable and another, and study features of each variable at
the same time, it is necessary to utilize a more complex statistic.

Descriptive Table: In this article, mainly devoted to interpreting the data related to quantitative
variables such as: GPA, working hours, time using cell phone, etc. In contrast to evaluating
qualitative variables, quantitative data have distinct labels in same variable, allowing measures of
location or dispersion of each variable to be calculated. Therefore, a descriptive table is utilized
to understand characteristics of data type (mean, minimum or maximum value, skewness, etc),
clearly accessible.

Charts:

 Bar Chart provides a far more comprehensive overview of data than only using table.
This is not to say that it is unworthy to generate a table because its needs to draw bar
graph. Furthermore, bar graphs are also useful for comparing multiple categories in one
variable when the author wants to gauge trend movement.
 Pie Charts are useful for showing relative frequency of a minor quantity of categories, but
they are not appropriate for a variable with large number of categories. Thus, the author
describes some variables with 2 or 3 types of results as Q25, Q26, Q27 in Pie chart to

28
reach more clearance.
 Clustered bar chart: To assess how the second category variable varies dependently on
each score of the first, means it divides data points across two category variables rather
than one. Studying relationships is good by consistent colours and arrangement for each
variable’s value to be displayed for each group.
 Scatter Plot: It can display enormous amounts of data and correlations between two
variables as clustering effects. This chart is not allowed to label data points, so its
difficulty in determining exact values, so the author use Pearson Correlation coefficient
together.
Cross tabulation: As shown in above task, Cross tab assists the author in proving the
correlation between two variables, and it differs in that it is simpler to demonstrate tendency and
probability in data collection. With a large data profile like the brief given, raw data is generally
overwhelming and can lead to a myriad of conflicting outcomes, so applying Cross tabulation
with its ability to distribute entire data collection into representative subgroups supports to
simplify data, to manage easily, as well as to lessen probability of assessment mistakes.

Correlation table: It is useful because it is possible to study the relationship between two
variables as analyzed above. But it also merely shows how one variable's movement in the other
without providing a reason for existing this relationship and an explanation of which variable is in
charge of affecting the other.

29
Part C. Analyse and evaluate business data
1. T-test to compare variables
Compare quantitative variables Q9, Q10 classified by qualitative variable Q1

Group Statistics
University N Mean Std. Deviation Std. Error Mean
GPA Oakland 61 3.15525 .468048 .059927
Colorado 97 3.18710 .505444 .051320
Working Hours Oakland 61 23.230 13.1889 1.6887
Colorado 97 5.253 9.3648 .9508

Independent Samples Test


Levene's Test
for Equality of t-test for Equality of Means
Variances
95% Confidence
Sig.
Mean Std. Error Interval of the
F Sig. t df (2-
Difference Difference Difference
tailed)
Lower Upper
GPA Equal variances .027 .869 -.397 156 .692 -.031857 .080299 -.190471 .126757
assumed
Equal variances -.404 134.92 .687 -.031857 .078899 -.187896 .124181
not assumed 1
Working Equal variances 7.687 .006 10.006 156 .000 17.9769 1.7965 14.4282 21.5256
Hours assumed
Equal variances 9.276 97.926 .000 17.9769 1.9380 14.1311 21.8228
not assumed

Assuming that all test is accreditation at 95% Confidence Interval, so α value is defined = 0.05

GPA Variable

H0: σ12 (Oakland) = σ22 (Colorado)

H1: σ12 (Oakland) ≠ σ22 (Colorado)

30
In Levene’s Test, it gets F=0.27 and Sig(F)=0.869 > α, so possibly concluding “Do not reject H0”,
which refers to using “Equal variances assumed” cell.

H0: Mean GPA (Oakland) = Mean GPA (Colorado)

H1: Mean GPA (Oakland) ≠ Mean GPA (Colorado)

With Sig(2-tailed) =0.692 > α, so it comes to a conclusion that “Do not reject H0” and there is
no significant difference in mean. In other words, between two mentioned universities, there is
no evidence to point out differences in GPA. In the Group statistics table, although Colorado has
a higher mean GPA than Oakland, the T-test results show that this difference is not statistically
significant at the 5% level.

Working Hours Variables

H0: σ12 (Oakland) = σ22 (Colorado)

H1: σ12 (Oakland) ≠ σ22 (Colorado)

In Levene’s Test, it gets F=7.687 and Sig(F)=0.006 < α, so possibly concluding “Reject H0”,
which refers to using “Equal variances not assumed” cell.

H0: Mean Working Hours (Oakland) = Mean Working Hours (Colorado)

H1: Mean Working Hours (Oakland) ≠ Mean Working Hours (Colorado)

With Sig(2-tailed) =0.00 < α, so it concludes that “Reject Null hypothesis”, meaning that time
spent doing paid-jobs of Oakland’s students is not equal with Colorado’s students. More
specifically, comparing with results of sample mean above, it also shows that 2 mean values in
Working hours variable is different significantly, and difference in population mean is also proved
from T-test. Therefore, it can be concluded that working hours between two universities are
unequal.

2. Z-test to compare variables


Compare proportions of “Working” students between 2 Universities

H0: p1 = p2

H1: p1 ≠ p2

31
With p1 refers Proportions observed in “Working Students of Oakland” with size n1, p2 refers
Proportions observed in “Working Students of Colorado” with size n2. Based on Z-test of

proportions formula: ; With overall sample proportion p =


p 1 n 1+ p 2n 2
n 1+n 2

Count of Living University


Working Hours2 Working Hours Colorado Oakland Grand Total
No Working 65.98% 13.11% 45.57%
Working 34.02% 86.89% 54.43%
Grand Total 100.00% 100.00% 100.00%

α
With significant level at 95%, z-score associated with a 5% α level, Z is 1.96. Solving the
2
α
formula, it got Zstat = 6.4957. As Rejection rule (performing two tailed test), |Zstat| > Z , it shows
2
that we can reject the null hypothesis and accepts the alternative, students’ working hours of
Colorado and Oakland are not equal.

3. Regression Model
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 3.220 .060 53.932 .000
Working Hours .000 .003 .003 .043 .966
Time using Cellphone -5.498E-5 .000 -.146 -1.827 .070
Using Electronic Textbook -.030 .092 -.026 -.322 .748

a. Dependent Variable: GPA

 Regression Line:

GPA = β0 + β1xWorking Hours + β2xTime Using Cellphone + β3xUsing Electronic Book + ɛ

32
GPA = 3.220 + 0Working Hours + (-5.498E-5) Time Using Cellphone + (-0.3) Using Electronic Book +

 This table describes the degree of influence of each variable (including 3 variables below)
on the dependent variable GPA:
(1) Working Hours: In other variables stay constant, if student's working hour increase by
1 hour, the average value of GPA will not change. Compare with conclusion from
Correlation Coefficient in part B, that is, Working Hours value does not have any effect on
students' GPA.
(2) Time Using Cellphone: In case of other variables being constant, if students’ time using
cell phone increase by 1 minute, the average value of GPA will decrease 5.498E-5 point
(3) Using Electronic Textbook: if case of other variables remains unchanged, the average
GPA of those who have ever used Electronic Textbook is 0.3 points lower than those
who have never used.
 The column labeled Standard Coefficient Beta illustrates extent of independent
variables' impact on the dependent variable (GPA). Sorted in order from strongest to
weakest, the influence of the independent variables on GPA is respectively: Time using
cell phone (0.146), Using electronic Textbook (0.026), Working Hours (0.003)
 For the population, further exploration is needed to appreciate significance of values
calculated above. T-test for significance, with Hypothesis that
H0: βi = 0
H1: βi ≠ 0
As shown in Sig Value of all 3 independent variables, P-value of Working Hours (0.966),
Time Using Cellphone (0.70), Using Electronic Textbook (0.748) are higher than α value
(0.05), so H0 is not rejected. And that is, independent variables included, none of which
impact a student's GPA variable.

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
a
1 .147 .022 .002 .489468
a. Predictors: (Constant), Using Electronic Textbook, Time using Cellphone Minutes, Working Hours

Regression equation on the basis of independent variables explains 2.2% of the change in
dependent variable (GPA), which is represented by R square. Otherwise, to assess clearly, R

33
square adjusted with a value of 0.2%, which means that in reality only 0.2% of variation of
GPA variable is explained by independent variables.

ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression .812 3 .271 1.130 .339b
Residual 36.895 154 .240
Total 37.708 157
a. Dependent Variable: GPA
b. Predictors: (Constant), Using Electronic Textbook, Time using Cellphone Minutes, Working Hours

H0: β1 = β2 = ...= 0
H1: One or more parameters is unequal to 0
The sig column represents the p-value (0.339) > α (0.05), so H0 is not rejected. Which shows
that this model has no predictive capability, ignore the f-value

4. Comparison of Summary statistics in Part B & Hypothesis Testing in Part C


Both methods can be used to analyze and evaluate data. Summary statistics is Descriptive
statistic synthesize essential and outstanding data regarding sample' characteristics. Otherwise,
hypothesis testing is inferential statistics that utilizes data from sample to deduce conclusions for
population, that far beyond available data. Representative provided data by Summary statistic
(sample mean, sample size, and sample standard deviation, etc.) are valuable in facilitating
hypothesis testing.

Specifically, in this coursework, T-test and Z-test are utilized with the acceptance of a margin of
error, most of which is 5%. With summary statistics, mean GPA of students of two schools is
recorded as unequal. But as implementing T-test, with an accepted margin of error = 0.05, GPA
between Colorado and Oakland has no difference.

5. Comparison using Regression Analysis in Part C and correlation


coefficients in Part B
Despite the fact that purposes, and results differ, correlation and regression are mutually
complementary in progress of research. Both techniques can demonstrate a positive or negative
relationship direction, a strong or weak relationship extent, and the influence of variables (Tanni,
et al., 2020).

34
To show the relationship between any two variables two or more variables is involved,
Correlation is used. In the event that a more in-depth examination is required of how one an
independent variable affects the dependent one, Regression analysis is a more effective and
reasonable method. Therefore, relationship between X and Y with appropriate correlation, X and
Y can be interchanged and provide same outcome, it is not valid in regression analysis. Some
more detailed information is provided by Regression Analysis, which Correlation does not reflect.
(1) Regression analysis reflects the cause-and-effect relationship. (2) Regression analysis
is a foundation for generating predictions and selecting an appropriate optimization
method. Correlation analysis merely displays the distribution of data on scatter plot diagram,
Regression analysis is depicted as a line with equation Y= a + bX, which allows studying how
dependent variable Y responds while independent variable X varies (increase or decrease) 1
unit.

Specifically, as analyzed in part B, in correlation, relationship between GPA and Time using
cellphone is only shown to be “Negligible Correlation” without indicating specifically which
variable impacts on which one. With regression analysis in part C, two variables are still
related, although impact of Time using cellphone variable on GPA variable is extremely low
(0.146). And if students' time using cellphone increases by 1 minute, the average value of GPA
will decrease by 5,498E-5 points.

35
III. CONCLUSION
In summarization, numerous computes and evaluate methods have been implemented in this
research work to assess a range of data of students from Colorado and Oakland. Besides, taking
advantage of these methods and tools helps to evaluate the relationship between variables in the
data table.

IV. REFERENCES

1. FERNANDO, J., 2021. Correlation Coefficient. [Online]


Available at: https://www.investopedia.com/terms/c/correlationcoefficient.asp
2. HILLIER, W., 2021. Descriptive Vs. Inferential Statistics: What's The Difference. [Online]
Available at: https://careerfoundry.com/en/blog/data-analytics/inferential-vs-descriptive-
statistics/
3. Kabir, S. M. S., 2016. METHODS OF DATA COLLECTION. In: Basic Guidelines for
Research: An Introductory Approach for All Disciplines. Bangladesh: Book Zone
Publication.
4. Kushwaha, N., 2020. Descriptive statistics summary for Data science. [Online]
Available at: https://medium.com/analytics-vidhya/descriptive-statistics-acba9c2f8e5b
5. Kwantlen Polytechnic University, 2020. INTEGRATED PRIMARY & SECONDARY
RESEARCH. In: An Open Guide to IMC. s.l.:s.n.
6. Reid, A., 2018. Advantages & Disadvantages of a Frequency Table. [Online].
7. Tanni, S. E., Patino, C. M. & Ferreira, J. C., 2020. Correlation vs. regression in
association.

36

You might also like