All Merged Revised

1. 200 data of advertising budget using YouTube and the respective sales earning were collected.
You
are asked to analyze whether increasing the advertising budget would increase the sales. The
following dataset is given to you
str(marketing)
'data.frame': 200 obs. of 4 variables:
$ youtube : num 276.1 53.4 20.6 181.8 217 ...
$ sales : num 26.5 12.5 11.2 22.2 15.5 ...

The youtube column shows the advertising budget spending, and the sales column shows the earning. All
these numbers are in thousands of dollar.
The first thing that you need to do

is Answer
Determine w hether there is a strong relationship betw een the spending on YouTube advertisement service and the sales earning.
Therefore, you plotted the advertising budget and its respective sales:
Pearson correlation test

After that, you ran a Answer and the results is R=0.7822R=0.7822.
Based on the value of R, you know

that Answer
there is a strong positive relationship betw een the spending on YouTube advertisement service and the sales earning.
. Therefore, you decided to summarize and study the relationship between the number of students and
Simple Linear Regression
the number of books sold by using Answer . The results came as
follow
Call:
lm(formula = sales ~ youtube, data = marketing)
Residuals:
Min 1Q Median 3Q Max
-10.0632 -2.3454 -0.2295 2.4805 8.6548
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.439112 0.549412 15.36 <2e-16 ***
youtube 0.047537 0.002691 17.67 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.91 on 198 degrees of freedom
Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099
F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16

From the model, you could conclude that:
For an advertising budget that equals to zero, a company may expect a sale of USD 8,440.
The budget spent for advertising is not a significant predictor of sales.
For each dollar spent for advertising, a company could expect a sales earning of USD 47.537.
the intercept 8.439 is a strong predictor of the model

For each dollar spent for advertising, a company could expect a sales earning of USD 8,440.
Based on the model, for a company that spent USD 1,000 for YouTube advertising, the company could
55976. 112
expect sales earning of USD Answer
2. Which function(s) that is(are) used to store a workspace in R?
Select one or more:
a. write.csv()
b. save.image()
c. save.csv()
d. save()
e. saveRDS()
3. A statistical test that compares or tests the suitability of observations against expectations or its
theoretical frequencies are
Select one:
a. ANOVA
b. Wilcoxon Rank Sum Test
c. t-test
d. Test of Independence
e. Goodness-of-Fit
4. Question text
You are interested in learning the favorite programming languages of the first year Indonesian Informatics
and/or Computer Science undergraduate students. To achieve this mission, you asked your
highschool classmates who admitted to the specified program.
TRUE
a. The data are collected properly and bias is minimized Answer
b. Because a variable is a characteristic of each individual on which data is collected, which of the
following are variables that suit well with the research question?
number of students who chose particular programming as their favorite one
chosen programming language
gender
the respondent's final score in algorithm and programming course
c. Which chart or graph would be appropriate to display the concerned variable(s)?
a boxplot
a time plot
a bar graph
a pie chart
Question 5
Question text
Which one of the following that is best treated as ordinal variables?
Select one:
a. salary
b. educational degree
c. phone number
d. city of residence
e. hair color
Question 6
Question text
1,341 undergraduate students were surveyed, to gain knowledge about the preferred teaching-and-
learning method of the whole UNSRAT students. There are three teaching-and-learning methods:
online, offline, or blended. The answers then tabulated and the frequency of each method is
presented in the report. Match the item/condition from the example above with the right term!
Statistics frequency
Answer 1
Population UNSRAT students

Answer 2
Samples the surveyed 1,341 undergraduate students

Answer 3
Variable preferred teaching-and-learning method

Answer 4
Question 7
Question text
What can be learned from a histogram and/or a stem-and-leaf display?
Select one or more:
a. Data distribution
b. Central tendency
c. Data symmetricity
d. Distribution gap
e. Outliers
Question 8
Question text
You are assigned to study whether there is a relationship between video game publishers and the video
game genres. You have a dataset with the following structure:
str(vgs)
$ Rank : int 1 3 4 8 11 12 14 15 16 17 ...
$ Name : Factor w/ 8427 levels ".hack: Sekai no Mukou ni + Versus",..: 8048 4013 8049 8046
5006 4012 8042 8043 3598 2681 ...
$ Platform : Factor w/ 10 levels "DS","PC","PS",..: 8 8 8 8 1 1 8 8 9 5 ...
$ Year : Factor w/ 29 levels "1985","1988",..: 16 18 19 16 15 15 17 19 20 23 ...
$ Genre : Factor w/ 10 levels "Action","Adventure",..: 9 5 9 4 8 5 9 9 4 1 ...
$ Publisher : Factor w/ 467 levels "10TACLE Studios",..: 297 297 297 297 297 297 297 297 266
402 ...
$ NA_Sales : num 41.49 15.85 15.75 14.03 9.07 ...
$ EU_Sales : num 29 12.9 11 9.2 11 ...
$ JP_Sales : num 3.77 3.79 3.28 2.93 1.93 4.13 3.6 2.53 0.24 0.97 ...
$ Other_Sales : num 8.46 3.31 2.96 2.85 2.75 1.92 2.15 1.79 1.67 4.14 ...
$ Global_Sales: num 82.7 35.8 33 29 24.8 ...
You explored the data by making a barplot that shows the grouped distribution, and it came as follow:
Cross-tabulation
To achieve the goal of the study, you create a Answer table.
Genre
Platform Action Adventure Fighting Misc Racing Role-Playing Shooter Simulation
DS 343 240 36 393 67 200 42 285
PC 165 65 6 24 60 104 148 115

PS 157 69 108 76 145 97 96 60
PS2 348 196 150 222 216 187 160 90
PS3 380 74 76 124 92 119 156 31
PS4 122 19 17 15 17 47 34 5
PSP 222 213 74 106 65 192 37 29
Wii 238 84 42 280 94 35 66 87
X360 324 47 65 126 105 76 203 40
XB 155 26 48 46 123 23 132 24
Genre
Platform Sports Strategy
DS 148 79
PC 49 188
PS 222 70
PS2 400 71
PS3 213 24
PS4 43 5
PSP 135 60
Wii 261 25
X360 220 28
XB 170 21
Chi-square Test of Independence

With a 95% degree of freedom, you ran a Answer , and the result
came as follow
data: [HIDDEN]
X-squared = 2873.1, df = 81, p-value < 2.2e-16

Therefore, according to the result, then it can be concluded that
All platforms share the same amount of video game publications on each genre.
There are certain video game genres that commonly published for specific platforms.
There is a significant relationship between video-game platform and the video-game genres.
There is no significant relationship between genres and the platform used.
Question 9
Question text
The following table contains a subset of the results from a survey about how the first year UNSRAT
undergraduate students access e-Learning.
questionnaire_code program access_mean
STU001 informatics personal notebook/PC
STU002 civil shared notebook/PC
STU003 law NA
STU004 medical personal tablet
Match the item/condition from the example above with the right term!
Element STU001
Answer 1
Observation Answer 2 personal notebook/PC
Variable access_mean
Answer 3
Question 10
Question text
Background
You are assigned to analyze a dataset that contains the scores of students in a class. There are 3
quizzes given to them. Is there any difference in scores between different each quiz?
Data Exploration
The structure of the dataset is as follow
str(qrt)
$ student: Factor w/ 35 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ score : num 0 23.8 26 20.4 12.1 ...
$ test : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
boxplot
You then plotted a Answer , and the result came as follow
Since you were comparing scores in three different quizzes with the same participants (students), then
you need the proper method. Therefore, you need to decide which method to use. So you start with
distribution normality
checking the Answer of the performance score on each group by using
Shapiro-Wilk Test
the Answer . The p-value for each tested group is
shown in the following table:
quiz p-value
1 0.001545939
2 0.016061633
3 0.003481896
Statistical Tests
non-parametric method
Based on it, then you decide to use Answer to find is there any difference in
those quizzes. Due to the nature of the problem, then you ran
Friedman Test
a Answer , and the result is as follow:
data: score and test and student
[HIDDEN] chi-squared = 30.778, df = 2, p-value = 2.073e-07

Based on the result, then you decide
to Answer
continue w ith further tests to find in w hich quiz students tend to achieve better scores
.Th
Pairw ise Wilcoxon RST
erefore you ran a Answer , and the results came as
follow:
[NAME of TEST HIDDEN]
data: score and test
1 2
2 0.00060 -
3 2.1e-05 0.00046
P value adjustment method: bonferroni

Conclusion
Based on the results of the statistical tests, then you conclude that
There is no significantly different score achievements in all three quizzes
Students tend to achieve higher scores in Test 3, followed with Test 2, yet the differences are not
significant
Scores in Test 1 and 3 are significantly different, but Tests 1 and 2 are not, and so with Tests 2 and 3
Students tend to achieve significantly higher scores in Test 3, followed with Test 2. Test 1 scores are
significantly lower than the other two tests.

11. You are interested in knowing the percentage of how the first year UNSRAT undergraduate students
access e-Learning. To estimate the percentage, you survey with 500 randomly selected students
and determine what are the means used by the students.
Population First year UNSRAT undergraduate students
Answer 1
Statistics Percentage
Answer 2
Samples 500 randomly selected first year UNSRAT undergraduate students

Answer 3
Parameter The means used to access e-Learning

Answer 4
12. Background
You are assigned to analyze a dataset that contains the performance score measures of participants at
two-time points. The aim of this study is to evaluate the effect of gender and stress on performance
scores. Is there any difference in performance between different stress levels? If any, which
one yields the best/worst performance score?
Data Exploration
str(performance)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 60 obs. of 5 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ gender: Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
$ stress: Factor w/ 3 levels "low","moderate",..: 1 1 1 1 1 1 1 1 1 1 ...
$ t1 : num 5.96 5.51 5.63 5.71 5.74 ...
$ t2 : num 5.58 5.82 5.47 5.79 5.72 ...

Boundary
Since the performance is measured twice, in this problem we only focus on the first measurement (the t1
column).
boxplot
You then plotted a Answer
, and the result came as follow
Since you were comparing a variable in three different groups, then you need the proper method.
Therefore, you need to decide which method to use. So you start with checking
the Answer
of the performance score on each group by using
Shapiro-Wilk Test
the Answer
The p-value for each tested group is shown in the following table:
stress level p-value
low 0.11428304
moderate 0.07023834
high 0.92983350
Statistical Tests
parametric method
Based on it, then you decide to use Answer
to find is there any difference between the stress levels on the performance. Due to the nature of the
One-w ay ANOVA
problem, then you ran a Answer
, and the result is as follow:
Df Sum Sq Mean Sq F value Pr(>F)
stress 2 0.8235 0.4117 14.5 8.13e-06 ***
Residuals 57 1.6190 0.0284
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
to Answer
continue w ith further tests to find w hich stress level has significantly different impact on performance
TukeyHSD
Therefore you ran a Answer
, and the results came as follow:

95% family-wise confidence level
Fit: [HIDDEN]
$stress
diff lwr upr p adj
moderate-low 0.1052102 -0.02303774 0.2334582 0.1279077
high-low -0.1786052 -0.30685319 -0.0503573 0.0040329
high-moderate -0.2838155 -0.41206340 -0.1555675 0.0000053
Conclusion
employess with high stress level have significantly lower performance, followed with those with
moderate, and then low stress level
there is no significant performance differences in all stress level
the experiment is a violation of human rights
employees with a high stress level tend to have significantly lower performance compared to
employees with moderate and low stress levels
there is no significant performance difference between employees with moderate and low stress
levels
employee with moderate stress level tend to have significantly higher performance than those with
low and/or high stress levels
13. You are interested in learning the favorite programming languages of the first year Indonesian
Informatics and/or Computer Science undergraduate students. To achieve this mission, you asked
your highschool classmates who admitted to the specified program.
TRUE
a. The data are collected properly and bias is minimized Answer
gender
14. Which chart or graph would be appropriate to display the concerned variable(s)?
a time plot
a pie chart
a boxplot
a bar graph
15. You are assigned to study whether there is a relationship between the category and the content
rating of selected apps in Google PlayStore. You have a dataset with the following structure:
str(googleplaystore)
$ App : Factor w/ 3088 levels "¡Ay Caramba!",..: 2472 2679 654 2617 580 1701 2595 790 2762
2343 ...
$ Category : Factor w/ 3 levels "FAMILY","GAME",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Rating : num 4.5 4.5 4.4 4.7 4.5 4.2 4.4 4.6 4.3 4.3 ...
$ Reviews : Factor w/ 2379 levels "0","1","10","100",..: 1556 1061 828 969 412 1363 1730 865 2175
68 ...
$ Size : Factor w/ 219 levels "1.0M","1.1M",..: 143 165 163 43 93 45 219 215 136 45 ...
$ Installs : Factor w/ 21 levels "0","0+","1,000,000,000+",..: 10 3 19 7 7 16 10 10 19 19 ...
$ Type : Factor w/ 3 levels "Free","NaN","Paid": 1 1 1 1 1 1 1 1 1 1 ...
$ Price : Factor w/ 38 levels "$0.99","$1.04",..: 38 38 38 38 38 38 38 38 38 38 ...
$ Content.Rating: Factor w/ 4 levels "Everyone","Everyone 10+",..: 2 2 1 1 1 1 1 2 1 1 ...

$ Genres : Factor w/ 85 levels "Action","Action;Action & Adventure",..: 4 7 23 19 23 29 1 77 1 23 ...
$ Last.Updated : Factor w/ 921 levels "April 1, 2017",..: 460 396 465 78 412 25 735 536 465 691 ...
$ Current.Ver : Factor w/ 1094 levels "0.0.1","0.0.2",..: 606 444 186 551 249 348 1093 600 347 332 ...
$ Android.Ver : Factor w/ 24 levels "1.5 and up","1.6 and up",..: 15 15 15 15 14 15 8 15 13 13 ...

Cross-tabulation
To achieve the goal of the study, you create a Answer
table.
Content.Rating
Category Everyone Everyone 10+ Mature 17+ Teen
FAMILY 1529 131 50 261
GAME 608 131 74 331
NEWS_AND_MAGAZINES 169 66 14 34

With a 95% degree of freedom, you ran a Answer

data: [HIDDEN]

There is no significant relationship between the category and the content rating of the selected apps
from Google PlayStore.
There is a significant relationship between the category and the content rating of the selected apps
The content ratings of Google PlayStore apps are not related to the category.
Most apps categories in Google PlayStore are highly related with the content rating.
16. Qualitative data could be organized with the following ways:
Select one or more:
a. Relative frequency
b. Percentage
c. Frequency distribution table
d. Tally marks
e. Pie chart
f. Raw data
g. Bar plot
h. Boxplot
17. Which of the following are best treated as discrete variables?
Select one or more:
a. Gender
b. Names of your classmates
c. Number of students in a class
d. Average scores in a quiz
e. Number of students in the whole university
f. Grades frequency at the end of a course
g. Number of children in a family
h. Final scores in a course
18. The alternate hyphothesis of a _____ t-test has the form of "The mean of x of the A group is higher
than ..."
Select one:
a. Paired
b. Two-tail
c. Half-tail
d. Unpaired
e. One-tail ✅ jawaban yang benar
19. To gain information about the number elements in a vector, we use the _____ function.
Select one:
a. length() ✅ jawaban yang benar
b. ncol()
c. sizeof()
d. nrow()
e. getLength()
20. You are assigned to study if there is any connection between the district where a person lives and
his/her hobby. There are 671 randomly selected respondents that were interviewed. Their answers
are collected into a data frame with the following structure:
str(district.hobby)
$ district: Factor w/ 4 levels "DISTRICT 1","DISTRICT 2",..: 2 1 4 4 2 2 2 3 1 4 ...
$ hobby : Factor w/ 6 levels "BASKETBALL","FOOTBALL",..: 3 3 2 6 2 3 2 3 5 2 ...
Frequency distribution
table.
hobby
district BASKETBALL FOOTBALL PAINTING PHOTOGRAPHY SINGING TRAVELING
DISTRICT 1 39 29 19 28 37 29
DISTRICT 2 29 33 29 30 25 32
DISTRICT 3 26 24 30 22 30 19
DISTRICT 4 28 36 23 24 26 24

data: [HIDDEN]
X-squared = 13.811, df = 15, p-value = 0.5399
Some hobbies are significantly preferred in certain districts.
There is a significant relationship between the district where someone lives with his/her
hobby.
There is no significant relationship between the district where someone lives with his/her
hobby.jawaban yg benar
Someone's hobby is independent of the district where one lives. jawaban yg benar
21. X is a mean to organize quantitative data. It shows the sum of a class and all classes below
it. What is X?
Select one:
a. Ogive
b. Cummulative Frequency Distribution jawaban yang benar
c. Histogram
d. Stem-and-leaf display
e. Frequency distribution table
22. Methods that can be used to find out whether the data is normally distributed or not are
Select one or more:
a. Observing histogram and density plot jawaban benar
b. Observing pie chart and bar plot
c. Applying the Shapiro-Wilk test jawaban benar
d. Applying the Kruskal-Wallis test
e. Applying the Kolmogorov-Smirnov test jawaban benar
f. Applying the Wilcoxon Rank Sum test
23. Which of the following are best treated as continuous variables?
Select one or more:
a. The distance between two cities jawaban benar
b. Number of classes in a college building
c. Number of students who achieve pass grades
d. Height jawaban benar
24. A publishing company is currently reviewing proposals from bookstores in several universities.
These bookstores are asking for more programming books to be stocked for each of them. Since the
stock in the company's warehouse is limited, hence the management will decide the allocation
based on historical sales data. Therefore, the management asked you to make the analysis. The data
that they possess contains historical data of the number of students who took programming
courses and the number of programming books sold at the respective university bookstore. Should
the university with more students who took a programming course to be allocated more
books?
The dataset that the management gave you is as follow:

str(student.books)
$ nstudents : int 204 179 200 177 207 195 166 178 213 130 ...
$ books_sold: int 441 329 467 376 504 396 354 439 461 235 ...
The column nstudents shows the number of students while the column books_sold shows the
number of books sold at a university bookstore with the respective number of students
who took the programming course.
The first thing that you need to do is Answer
Therefore, you plotted the number of students with the respective numbers of books sold:
After that, you ran a Answer
and the results is R=0.7680R=0.7680.
Based on the value of R, you know that Answer

. Therefore, you decided to summarize and study the relationship between the number of
students and the number of books sold by using Answer
. The results came as follow
Call:
lm(formula = books_sold ~ nstudents, data = student.books)
Residuals:
-80.265 -37.203 -2.531 38.198 83.988
Coefficients:
(Intercept) 1.1796 23.6051 0.05 0.96
nstudents 2.1165 0.1166 18.15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
1.1796 is a strong predictor of the model

For each student taking the programming class, the respective university bookstore could
expect a sale of 2.1165 books.
The number of students taking the programming class is a significant predictor of sales.
For a number of students that equals to zero, a bookstore may expect a sale of 1.1796 books,
however since the factor itself is not significant, then the bookstore should not clinging
onto that.
25. Frequency of a categoric variabel could be visualized with
Select one:
a. Pie chart
b. Ogive
c. Bar plot
d. Line plot
e. Boxplot
Dimulai pada Tuesday, 15 December 2020, 23:41
Keadaan Selesai
Selesai pada Tuesday, 15 December 2020, 23:44
Waktu yang 3 min 31 detik

digunakan
Nilai 10,00 dari 10,00 (100%)
Soal 1
Benar
Poin 1,00 dari 1,00

Tandai pertanyaan
Teks soal
The suitable statistical test(s) to compare a variable in 3 or more groups, is(are)
Pilih salah satu atau lebih:
a. Paired samples t-test
b. Kruskal-Wallis
c. ANOVA
d. Kolmogorov-Smirnov
e. Wilcoxon Rank Sum test
f. Unpaired samples t-test

Umpan balik
Pilihan-pilihan terbaik adalah: ANOVA, Kruskal-Wallis
Soal 2
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
Background
You are assigned to analyze a dataset that contains measures of cholesterol concentration in 72
participants treated with three different drugs. The aim is to examine the potential of a new class of
drugs in lowering cholesterol concentration and consequently reducing heart attack. The participants
include 36 males and 36 females. Males and females were further (equally) subdivided into whether
they were at low or high risk of a heart attack. Is there any difference in the impact of each drug on
cholesterol concentration? If any, which one has the highest impact, in terms of the lowest
cholesterol concentration?
Data Exploration
str(heartattack)
$ gender : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
$ risk : Factor w/ 2 levels "high","low": 2 2 2 2 2 2 2 2 2 2 ...
$ drug : Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ...
$ cholesterol: num 5.24 5.08 4.68 5.36 4.96 ...
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
boxplot
You then plotted a Jawaban

the Jawaban
of the cholesterol concentration on each group by using

Shapiro-Wilk Test
the Jawaban
. The p-value for each tested group is shown in the following table:
drug p-value
A 0.1537620
B 0.7674545
C 0.5537145
Statistical Tests
parametric method
Based on it, then you decide to use Jawaban
to find is there any difference between the drugs used toward the cholesterol concentration.
Due to the nature of the problem, then you ran
One-way ANOVA
a Jawaban
drug 2 1.235 0.6177 2.63 0.0793 .
Residuals 69 16.204 0.2348
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

to Jawaban
draw a final conclusion
[FURTHER TEST IS UNNECESSARY]

Therefore you ran a Jawaban
Fit: [HIDDEN]
$drug
diff lwr upr p adj
B-A -0.277327333 -0.6124096 0.05775494 0.1241979
C-A -0.278421280 -0.6135035 0.05666099 0.1222405
C-B -0.001093947 -0.3361762 0.33398832 0.9999663
Conclusion
drug B yields a significantly less cholesterol rate than drug A
the drugs gave no significantly different impact on the cholesterol rate
drug C yields a significantly less cholesterol rate than drugs A and B
the experiment is a mess

drug that yields the lowest cholesterol rate is drug C, followed with drug B, and then drug A
Poin 5,00 dari 5,00
Pilihan terbaik adalah:
• the drugs gave no significantly different impact on the cholesterol rate
(Credit: dataset used in this vignettes is based on the heartattack dataset in the datarium package)
Soal 3
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
To gain information about the number elements in a vector, we use the _____ function.
Pilih salah satu:
a. ncol()
b. length()
c. sizeof()
d. nrow()
e. getLength()
Umpan balik
Pilihan terbaik adalah: length()
Soal 4
Benar
Poin 1,00 dari 1,00

Tandai pertanyaan
Teks soal
You are interested in learning the favorite programming languages of the first year Indonesian
Informatics and/or Computer Science undergraduate students. To achieve this mission, you asked your
highschool classmates who admitted to the specified program.
TRUE
a. The data are collected properly and bias is minimized Jawaban
b. Because a variable is a characteristic of each individual on which data is collected, which of

the following are variables that suit well with the research question?
gender

Poin 1,00 dari 1,00
• chosen programming language
a time plot
a boxplot
a pie chart
a bar graph
Poin 1,00 dari 1,00
• a bar graph
• a pie chart
Soal 5
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
A publishing company is currently reviewing proposals from bookstores in several universities. These
bookstores are asking for more programming books to be stocked for each of them. Since the stock in
the company's warehouse is limited, hence the management will decide the allocation based on
historical sales data. Therefore, the management asked you to make the analysis. The data that they
possess contains historical data of the number of students who took programming courses and the
number of programming books sold at the respective university bookstore. Should the university with
more students who took a programming course to be allocated more books?

str(student.books)
$ nstudents : int 204 179 200 177 207 195 166 178 213 130 ...
$ books_sold: int 441 329 467 376 504 396 354 439 461 235 ...
number of books sold at a university bookstore with the respective number of students who
took the programming course.
The first thing that you need to do is Jawaban Determine whether there is a strong relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore.

After that, you ran a Jawaban
Based on the value of R, you know that Jawaban There is a strong positive relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore
. Therefore, you decided to summarize and study the relationship between the number of
students and the number of books sold by
using Jawaban
Call:
Residuals:
-80.265 -37.203 -2.531 38.198 83.988
Coefficients:
(Intercept) 1.1796 23.6051 0.05 0.96
nstudents 2.1165 0.1166 18.15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The number of students taking the programming class is a significant predictor of
sales.
For a number of students that equals to zero, a bookstore may expect a sale of 1.1796
books, however since the factor itself is not significant, then the bookstore should not clinging
onto that.
Poin 3,00 dari 3,00
• For a number of students that equals to zero, a bookstore may expect a sale of 1.1796
books, however since the factor itself is not significant, then the bookstore should not
clinging onto that.
• The number of students taking the programming class is a significant predictor of sales.
• For each student taking the programming class, the respective university bookstore
could expect a sale of 2.1165 books.
.
Based on the model, for a university with 100 students taking a programming course, the publisher
212,82
could expect the respective bookstore would sell Jawaban
Soal 6
Benar
Poin 1,00 dari 1,00

Tandai pertanyaan
Teks soal
What can be learned from a histogram and/or a stem-and-leaf display?
a. Data symmetricity
b. Outliers
c. Central tendency
d. Distribution gap
e. Data distribution
Umpan balik
Pilihan-pilihan terbaik adalah: Data symmetricity, Data distribution, Outliers, Distribution gap,
Central tendency
Soal 7
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
You are assigned to study if there is any connection between the district where a person lives and
his/her preferred social media. There are 1,200 randomly selected respondents that were interviewed.
Their answers are collected into a data frame with the following structure:
str(district.socmed)
$ district: Factor w/ 4 levels "DISTRICT 1","DISTRICT 2",..: 4 1 3 1 1 3 1 2 4 4 ...

$ socmed : Factor w/ 6 levels "FACEBOOK","FRIENDSTER",..: 4 4 4 1 5 6 3 2 1 3 ...
You explored the data by making a barplot that shows the grouped distribution, and it came as
follow:
Cross-tabulation
To achieve the goal of the study, you create a Jawaban
table.
socmed
district FACEBOOK FRIENDSTER INSTAGRAM LINKEDIN RESEARCHGATE TWITTER
DISTRICT 1 65 48 40 55 43 57
DISTRICT 2 48 46 51 53 29 58
DISTRICT 3 51 51 33 59 65 42
DISTRICT 4 49 57 57 47 54 42

With a 95% degree of freedom, you ran a Jawaban
data: [HIDDEN]
Someone's preferred social media is independent of the district where one lives.
preferred social media.
Some social media are significantly preferred in certain districts.
preferred social media.

Poin 2,00 dari 2,00
• There is a significant relationship between the district where someone lives with
his/her preferred social media.
• Some social media are significantly preferred in certain districts.
Soal 8
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
You are interested in knowing the achievement of the present second-year students in your
program at their first semester. It is measured according to the GP achieved. You then collected
the 1st semester GP of 31 randomly selected second-year students and calculate the mean.
Statistics Average
Answer 1
Population Second-year students

Answer 2
Parameter GP
Answer 3
Samples 31 randomly selected second-year students

Answer 4
Umpan balik
The correct answer is: Statistics → Average, Population → Second-year students, Parameter →
GP, Samples → 31 randomly selected second-year students
Soal 9
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal

STU003 law NA
Observation Answer 1 personal notebook/PC
Answer 2
Element STU001
Answer 3
Umpan balik
Your answer is correct.
The correct answer is: Observation → personal notebook/PC, Variable → access_mean, Element
→ STU001
Soal 10
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
Which of the following are best treated as discrete variables?
a. Number of students in a class
b. Number of students in the whole university
c. Average scores in a quiz

d. Final scores in a course
e. Names of your classmates
f. Gender
g. Number of children in a family
h. Grades frequency at the end of a course

Umpan balik
Pilihan-pilihan terbaik adalah: Number of students in a class, Number of students in the whole
university, Grades frequency at the end of a course, Number of children in a family
SORTIR
1. The suitable statistical test(s) to compare a variable to a specific value is(are)

• Wilcoxon Signed Rank test
• One-sample t-test
2.
To achieve the goal of the study, you create a Cross-tabulation
table
With a 95% degree of freedom. you ran a Chi-square Test of Independence , and the result came
as follow
• There is no significant relationship between the district where someone lives with
his/her hobby
• Someone's hobby is independent of the district where one lives.
3. The following table contains a subset of the results from a survey about how the first year
UNSRAT undergraduate students access e-Learning
Observation 7
Element STU003
Variabel SATICFATION
4. Frequency of a categoric variabel could be visualized with Bar plot
5. You are interested in learning the favorite programming languages of the first year Indonesian
Informatics and/or Computer Science undergraduate students. To achieve this mission, you
asked your highschool classmates who admitted to the specified program
A. The data are collected properly and bias is minimized TRUE

B. Because a variable is a characteristic of each individual on which data is collected, which of
the following are variables that suit well with the research question?
• Chosen programing language
C. Which chart or graph would be appropriate to display the concerned varable(s)?
• a bar graph
• a pie chart
6. Which one that is NOT the function used to store data?
• Write.csv()
7. You are interested in knowing the achievement of the present second year students in your
program ar their first semester. It is measured according to the GP achieved You then collected
the 1st semester GP of 31 randomly selected second-year students and calculate the mean
March the item/condition from the example above with the right term!
• Statistics AVERAGE
• Parameter GP
• Population Second-year students
• Samples 31 randomly selected second-year students
8. A publishing company is currently reviewing proposals from bookstores in several universities.
These bookstores are asking for more programming books to be stocked for each of them. Since
the stock in the company's warehouse is limited, hence the management will decide the
allocation based on historical sales data Therefore, the management asked you to make the
analysis. The data that they possess contains historical data of the number of students who took
programming courses and the number of programming books sold at the respective university
bookstore Should the university with more students who took a programming course to be
allocated more books?
The dataset that the management gave you is as

follow
The column students shows the number of students while the column books sold shows the
number of books sold at a university bookstore with the respective number of students who
took the programming course
The first thing that you need to do is

Determine whether there is a strong relationship between the number of students taking a
programming course and the number of programming book sold at a particular university
bookstore
After that, yout ran a Pearson correlation test and the results is R = 0.7680
Based on the value of R, you know that
There is a strong positive relationship between the number of students taking a programming course
and the number of ramming book sold at a particular university bookstore
Therefore, you decided to summarize and study the relationship between the number of students and
the number of books sold by using Simple Linear Regression The results came as
follow
• For each student taking the programming class, the respective university bookstore could
expect a sale of 21165 books
• The number of students taking the programming class is a significant predictor of sales
• For a number of students that equals to zero, a bookstore may expect a sale of 1.1796 books,
however since the factor itself is not significant, then the bookstore should not clinging onto
that
could expect the respective bookstore would sell 212.82
9. Which of the following are best treated as continuous variables?

• The distance between two cities

• Height
10. You are assigned to analyze a dataset that contains the performance store measures of
participanTS at two time points The aim of thiS Study is to evaluate the effect of gender and
stress on performance scores is there any difference in performance between different stress
levels? If any, which one yields the best worst performance score?
Boundary
Since the performance is measured twice, in this problem we only focus on the first
measurement (the t1 column)
You then plotted a bloxplot and the result came as follow
Since you were comparing a variable in three different groups, then you need the proper
method. Therefore, you need to decide which method to use. So you start with checking the
distribution normality performance score on each group by using the Shapiro-Wilk Test The
pvalue for each tested group is shown in the following table:
Statistical Tests
Based on it, then you decide to use parametric method to find is there any difference
between the stress levels on the performance Due to the nature of the problem, then you ran a
One-way ANOVA and the result is as
follow
Based on the result, then you decide to

continue with further tests to find which stress level has significantly different impact on
performance
Therefore you ran a TukeyHSD and the results came as follow:
• there is no significant performance difference between employees with moderate and

low stress levels
• employees with a high stress level tend to have significantly lower performance
compared to employees with moderate and low stress levels
Question 1
Correct
Mark 1.00 out of 1.00
Flag question
Question text
Which of the following are best treated as nominal variables?
Select one or more:

a. Number of children in a family
b. Gender
c. Number of students in a class
d. Phone number
e. Number of students in the whole university
f. Names of your classmates

g. Grades frequency at the end of a course
Feedback
The correct answers are: Gender, Phone number, Names of your classmates
Question 2
Correct
Flag question
Question text
The suitable statistical test(s) to compare a variable to a specific value is(are)
Select one or more:

a. Wilcoxon Rank Sum test
b. Wilcoxon Signed Rank test

c. ANOVA
d. One-sample t-test
e. Paired samples t-test
f. Kruskal-Wallis test
Feedback
The correct answers are: One-sample t-test, Wilcoxon Signed Rank test
Question 3
Correct
Flag question
Question text
Qualitative data could be organized with the following ways:
Select one or more:

a. Raw data
b. Pie chart
c. Relative frequency
d. Bar plot
e. Percentage
f. Tally marks
g. Frequency distribution table

h. Boxplot
Feedback
The correct answers are: Frequency distribution table, Tally marks, Relative frequency, Percentage
Question 4
Correct
Flag question
Question text
learning method of the whole UNSRAT students. There are three teaching-and-learning methods:
online, offline, or blended. The answers then tabulated and the frequency of each method is
presented in the report. Match the item/condition from the example above with the right term!

Answer 1
Answer 2
Answer 3
Answer 4
Feedback
The correct answer is: Variable → preferred teaching-and-learning method, Statistics → frequency,
Population → UNSRAT students, Samples → the surveyed 1,341 undergraduate students
Question 5
Correct
Flag question
Question text
learning method of the whole UNSRAT students. There are three teaching-and-learning methods: online,
offline, or blended. The following table contains a subset of the results
questionnaire_code preferred_tlm
STU001 blended
STU002 offline
STU003 online
STU004 blended
Element STU003
Answer 1
Variable preferred_tlm
Answer 2
Observation blended
Answer 3
Feedback
The correct answer is: Element → STU003, Variable → preferred_tlm, Observation → blended
Question 6
Correct
Flag question
Question text
bookstores are asking for more programming books to be stocked for each of them. Since the stock in
the company's warehouse is limited, hence the management will decide the allocation based on
historical sales data. Therefore, the management asked you to make the analysis. The data that they
possess contains historical data of the number of students who took programming courses and the
number of programming books sold at the respective university bookstore. Should the university with
more students who took a programming course to be allocated more books?

str(student.books)
$ nstudents : int 204 179 200 177 207 195 166 178 213 130 ...
$ books_sold: int 441 329 467 376 504 396 354 439 461 235 ...
number of books sold at a university bookstore with the respective number of students who took
the programming course.
The first thing that you need to do is Answer Determine whether there is a strong relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore.

Based on the value of R, you know that Answer There is a strong positive relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore
. Therefore, you decided to summarize and study the relationship between the number of students
and the number of books sold by using Answer

Call:
Residuals:
-80.265 -37.203 -2.531 38.198 83.988
Coefficients:
(Intercept) 1.1796 23.6051 0.05 0.96
nstudents 2.1165 0.1166 18.15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

that.

The correct answer is:
• For a number of students that equals to zero, a bookstore may expect a sale
of 1.1796 books, however since the factor itself is not significant, then the
bookstore should not clinging onto that.
• The number of students taking the programming class is a significant
predictor of sales.
• For each student taking the programming class, the respective university
bookstore could expect a sale of 2.1165 books.
.
212.82
could expect the respective bookstore would sell Answer
Question 7
Correct
Flag question
Question text
Suppose that you are interested in the percentage of cellphone brands owned by the students of
UNSRAT. Therefore, on Wednesday, after class, you asked all your classmates about the brands of
their cellphones.
a. Why can collecting data only from your classmates cause bias in the data?
Perhaps some of your classmates do not bring their cellphones on Wednesday.
The subjects were not randomly selected.
It assumes your classmates represent the whole population of UNSRAT students.

You should ask students from the other classes too.
It assumes the percentage of the cellphone brands owned by the first-year students may
represent the whole population of UNSRAT students.

• It assumes your classmates represent the whole population of UNSRAT

students.
• It assumes the percentage of the cellphone brands owned by the first-year
students may represent the whole population of UNSRAT students.
• The subjects were not randomly selected.
following are variables in this study?
One of your classmates.
The day the data collected.
cellphone brand
gender
• cellphone brand
c. Which chart or graph would be appropriate to display the proportion of the brands?
pie chart
boxplot
time plot
line plot
bar graph
• bar graph
• pie chart
Question 8
Correct
Flag question
Question text
Background
You are assigned to analyze a dataset that contains the final scores of students in three parallel
classes. Is there any difference in the scores of the students in a different class?
Data Exploration
str(quiz.result)
$ class: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
$ score: num 55.2 88.3 91.6 91.4 81.3 ...
boxplot
You then plotted a Answer
Since you were comparing scores in three different quizzes with the same participants (students), then
you need the proper method. Therefore, you need to decide which method to use. So you start with
checking the Answer

Shapiro-Wilk Test
the Answer
class p-value
A 0.4505716
B 0.2808105
C 0.2490939
Statistical Tests
non-parametric method
to find is there any difference in those quizzes. Due to the nature of the problem, then you ran
Kruskal-Wallis test
a Answer

data: score by class
[HIDDEN] chi-squared = 55.454, df = 2, p-value = 9.086e-13

to Answer
continue with further tests to find in which quiz students tend to achieve better scores
.
Dunn Test

Comparison of x by group
(Bonferroni)
Col Mean-|
Row Mean | A B
---------+----------------------
B | 2.193561
| 0.0848
C | 7.259698 5.066137
| 0.0000* 0.0000*
alpha = 0.05
Reject Ho if p <= alpha
Conclusion
There is no significantly different score achievements in all three classes

Students tend to achieve significantly higher scores in class A, followed by those in class B, and
the students in class C, yet the differences are not significant
Students tend to achieve significantly higher scores in class A, followed by those in class B, and
the students in class C.
Students in classes A and B tend to have higher scores than students in class C. The scores are
not significantly different between students in classes A and B

• Students in classes A and B tend to have higher scores than students in class
C. The scores are not significantly different between students in classes A
and B
Question 9
Correct
Flag question
Question text
You are assigned to study whether there is a relationship between video game publishers and the video
game genres. You have a dataset with the following structure:
str(vgs)
$ Rank : int 1 3 4 8 11 12 14 15 16 17 ...
$ Name : Factor w/ 8427 levels ".hack: Sekai no Mukou ni + Versus",..: 8048 4013
8049 8046 5006 4012 8042 8043 3598 2681 ...
$ Platform : Factor w/ 10 levels "DS","PC","PS",..: 8 8 8 8 1 1 8 8 9 5 ...
$ Year : Factor w/ 29 levels "1985","1988",..: 16 18 19 16 15 15 17 19 20 23 ...
$ Genre : Factor w/ 10 levels "Action","Adventure",..: 9 5 9 4 8 5 9 9 4 1 ...
$ Publisher : Factor w/ 467 levels "10TACLE Studios",..: 297 297 297 297 297 297 297
297 266 402 ...
$ NA_Sales : num 41.49 15.85 15.75 14.03 9.07 ...
$ EU_Sales : num 29 12.9 11 9.2 11 ...
$ JP_Sales : num 3.77 3.79 3.28 2.93 1.93 4.13 3.6 2.53 0.24 0.97 ...
$ Other_Sales : num 8.46 3.31 2.96 2.85 2.75 1.92 2.15 1.79 1.67 4.14 ...
$ Global_Sales: num 82.7 35.8 33 29 24.8 ...
Cross-tabulation
table.
Genre
Platform Action Adventure Fighting Misc Racing Role-Playing Shooter Simulation
DS 343 240 36 393 67 200 42 285
PC 165 65 6 24 60 104 148 115
PS 157 69 108 76 145 97 96 60
PS2 348 196 150 222 216 187 160 90

PS3 380 74 76 124 92 119 156 31
PS4 122 19 17 15 17 47 34 5
PSP 222 213 74 106 65 192 37 29
Wii 238 84 42 280 94 35 66 87
X360 324 47 65 126 105 76 203 40
XB 155 26 48 46 123 23 132 24
Genre
Platform Sports Strategy
DS 148 79
PC 49 188
PS 222 70
PS2 400 71
PS3 213 24
PS4 43 5
PSP 135 60
Wii 261 25
X360 220 28
XB 170 21


data: [HIDDEN]
There is a significant relationship between video-game platform and the video-game
genres.
All platforms share the same amount of video game publications on each genre.
There is no significant relationship between genres and the platform used.
There are certain video game genres that commonly published for specific platforms.
• There is a significant relationship between video-game platform and the

video-game genres.
• There are certain video game genres that commonly published for specific
platforms.
(Credit: the dataset was taken from "Video Game Sales Analyze sales data from more than 16,500 games"
by Gregory Smith, available on Kaggle.)
Question 10
Correct
Flag question
Question text
Select one:
a. ncol()
b. length()
c. sizeof()
d. nrow()
e. getLength()
Soal 1
Which one of the following that is best treated as ordinal variables?
Pilih salah satu:
a. educational degree
b. hair color
c. phone number
d. city of residence
e. salary
Umpan balik
Pilihan terbaik adalah: educational degree
Soal 2
Teks soal
You are interested in knowing the percentage of how the first year UNSRAT undergraduate students
access e-Learning. To estimate the percentage, you survey with 500 randomly selected students and
determine what are the means used by the students.
Parameter Answer 1
Answer
Samples
2
Population Answer 3
Statistics Answer 4
Umpan balik
The correct answer is: Parameter → The means used to access e-Learning, Samples → 500 randomly
selected first year UNSRAT undergraduate students, Population → First year UNSRAT undergraduate
students, Statistics → Percentage
Soal 3
Teks soal
X is a mean to organize quantitative data. It shows the sum of a class and all classes below it. What is X?
Pilih salah satu:
a. Stem-and-leaf display
b. Histogram
c. Cummulative Frequency Distribution
d. Frequency distribution table
e. Ogive
Umpan balik
Pilihan terbaik adalah: Cummulative Frequency Distribution
Soal 4
Teks soal
A publishing company is currently reviewing proposals from bookstores in several universities.

These bookstores are asking for more programming books to be stocked for each of them. Since
the stock in the company's warehouse is limited, hence the management will decide the
allocation based on historical sales data. Therefore, the management asked you to make the
analysis. The data that they possess contains historical data of the number of students who took
programming courses and the number of programming books sold at the respective university
bookstore. Should the university with more students who took a programming course to be
allocated more books?

str(student.books)
$ nstudents : int 204 179 200 177 207 195 166 178 213 130 ...
$ books_sold: int 441 329 467 376 504 396 354 439 461 235 ...
The column nstudents shows the number of students while the column books_sold shows the number
of books sold at a university bookstore with the respective number of students who took the
programming course.
The first thing that you need to do is Jawaban
and the results is R=0.7680
Based on the value of R, you know that Jawaban
. Therefore, you decided to summarize and study the relationship between the number of students and
the number of books sold by using Jawaban

Call:
Residuals:
-80.265 -37.203 -2.531 38.198 83.988
Coefficients:
(Intercept) 1.1796 23.6051 0.05 0.96
nstudents 2.1165 0.1166 18.15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

For each student taking the programming class, the respective university bookstore could expect a
sale of 2.1165 books.
however since the factor itself is not significant, then the bookstore should not clinging onto that.
For each student taking the programming class, the respective university bookstore could expect a
sale of 1.1796 books.
Poin 3,00 dari 3,00
that.
• For each student taking the programming class, the respective university bookstore could expect
a sale of 2.1165 books.
Based on the model, for a university with 100 students taking a programming course, the
212,82
publisher could expect the respective bookstore would sell Jawaban
Soal 5
Teks soal
You are assigned to study whether there is a relationship between the category and the content
rating of selected apps in Google PlayStore. You have a dataset with the following structure:
$ App : Factor w/ 3088 levels "¡Ay Caramba!",..: 2472 2679 654
2617 580 1701 2595 790 2762 2343 ...
$ Category : Factor w/ 3 levels "FAMILY","GAME",..: 2 2 2 2 2 2 2 2 2 2
...
$ Rating : num 4.5 4.5 4.4 4.7 4.5 4.2 4.4 4.6 4.3 4.3 ...
$ Reviews : Factor w/ 2379 levels "0","1","10","100",..: 1556 1061 828
969 412 1363 1730 865 2175 68 ...
$ Size : Factor w/ 219 levels "1.0M","1.1M",..: 143 165 163 43 93
45 219 215 136 45 ...
$ Installs : Factor w/ 21 levels "0","0+","1,000,000,000+",..: 10 3 19
7 7 16 10 10 19 19 ...
$ Type : Factor w/ 3 levels "Free","NaN","Paid": 1 1 1 1 1 1 1 1 1
1 ...
$ Price : Factor w/ 38 levels "$0.99","$1.04",..: 38 38 38 38 38 38
38 38 38 38 ...
$ Content.Rating: Factor w/ 4 levels "Everyone","Everyone 10+",..: 2 2 1 1 1
1 1 2 1 1 ...
$ Genres : Factor w/ 85 levels "Action","Action;Action &
Adventure",..: 4 7 23 19 23 29 1 77 1 23 ...
$ Last.Updated : Factor w/ 921 levels "April 1, 2017",..: 460 396 465 78
412 25 735 536 465 691 ...
$ Current.Ver : Factor w/ 1094 levels "0.0.1","0.0.2",..: 606 444 186 551
249 348 1093 600 347 332 ...
$ Android.Ver : Factor w/ 24 levels "1.5 and up","1.6 and up",..: 15 15 15
15 14 15 8 15 13 13 ...
table.
Content.Rating
FAMILY 1529 131 50 261
GAME 608 131 74 331

data: [HIDDEN]
There is no significant relationship between the category and the content rating of the selected
apps from Google PlayStore.
There is a significant relationship between the category and the content rating of the selected apps
Poin 0,00 dari 2,00
• There is a significant relationship between the category and the content rating of the selected
apps from Google PlayStore.
• Most apps categories in Google PlayStore are highly related with the content rating.
(Credit: the dataset was taken and subsetted from "Google Play Store Apps Web scraped data of
10k Play Store apps for analysing the Android market" by Lavanya Gupta, available on Kaggle.)
Soal 6
Teks soal
You are interested in learning the favorite programming languages of the first year Indonesian
Informatics and/or Computer Science undergraduate students. To achieve this mission, you
asked your highschool classmates who admitted to the specified program.
a. The data are collected properly and bias is minimized Jawaban
gender
Poin 1,00 dari 1,00
a time plot
a pie chart
a boxplot
a bar graph
Poin 1,00 dari 1,00
• a bar graph
• a pie chart
Soal 7
Tidak benar
Poin 0,00 dari 1,00
Tandai pertanyaan
Teks soal
Which function(s) that is(are) used to store a workspace in R?
a. write.csv()
b. saveRDS()
c. save()
d. save.image()
e. save.csv()
Umpan balik
Pilihan-pilihan terbaik adalah: save.image(), save()

Soal 8
Separuh benar
Poin 0,73 dari 1,00
Tandai pertanyaan
Teks soal
Background
You are assigned to analyze a dataset that contains measures of cholesterol concentration in 72
participants treated with three different drugs. The aim is to examine the potential of a new class
of drugs in lowering cholesterol concentration and consequently reducing heart attack. The
participants include 36 males and 36 females. Males and females were further (equally)
subdivided into whether they were at low or high risk of a heart attack. Is there any difference
in the impact of each drug on cholesterol concentration? If any, which one has the highest
impact, in terms of the lowest cholesterol concentration?
Data Exploration
str(heartattack)
$ cholesterol: num 5.24 5.08 4.68 5.36 4.96 ...
$ id : int 1 2 3 4 5 6 7 8 9 10 ...

Therefore, you need to decide which method to use. So you start with checking the Jawaban
of the cholesterol concentration on each group by using the Jawaban
drug p-value
A 0.1537620
B 0.7674545
C 0.5537145
Statistical Tests
to find is there any difference between the drugs used toward the cholesterol concentration. Due to the
nature of the problem, then you ran a Jawaban
drug 2 1.235 0.6177 2.63 0.0793 .
Residuals 69 16.204 0.2348
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Based on the result, then you decide to Jawaban

Fit: [HIDDEN]
$drug
diff lwr upr p adj
B-A -0.277327333 -0.6124096 0.05775494 0.1241979
C-A -0.278421280 -0.6135035 0.05666099 0.1222405
C-B -0.001093947 -0.3361762 0.33398832 0.9999663
Conclusion
Poin 5,00 dari 5,00
(Credit: dataset used in this vignettes is based on the heartattack dataset in the datarium
package)
Soal 9
Tidak benar
Poin 0,00 dari 1,00
Tandai pertanyaan
Teks soal
The alternate hyphothesis of a _____ t-test has the form of "The mean of x of the A group is higher than
..."
Pilih salah satu:
a. Unpaired
b. Two-tail
c. Paired
d. Half-tail
e. One-tail
Umpan balik
Pilihan terbaik adalah: One-tail
Soal 10
Tidak benar
Poin 0,00 dari 1,00
Tandai pertanyaan
Teks soal
The following table contains a subset of the results from a survey about how the first year
UNSRAT undergraduate students access e-Learning.
questionnaire_code program satisfaction
STU001 informatics 4
STU002 civil 7
STU003 law 3
STU004 medical 5
Element Answer 1
Observation Answer 2
Variable Answer 3
Umpan balik
Your answer is incorrect.
The correct answer is: Element → STU003, Observation → 7, Variable → satisfaction
1
questionnaire_code program satisfaction
STU001 informatics 4
STU002 civil 7
STU003 law 3
STU004 medical 5
The correct answer is: Element → STU003, Variable → satisfaction, Observation → 7
1,341 undergraduate students were surveyed, to gain knowledge about the preferred teaching-
and-learning method of the whole UNSRAT students. There are three teaching-and-learning
methods: online, offline, or blended. The answers then tabulated and the frequency of each
method is presented in the report. Match the item/condition from the example above with the
right term!

Answer 1
Answer 2
Answer 3
Answer 4
The correct answer is: Population → UNSRAT students, Variable → preferred teaching-and-
learning method, Statistics → frequency, Samples → the surveyed 1,341 undergraduate
students
You are assigned to study whether there is a relationship between the category and the content rating of
selected apps in Google PlayStore. You have a dataset with the following structure:
$ App : Factor w/ 3088 levels "¡Ay Caramba!",..: 2472 2679 654 2617 580
1701 2595 790 2762 2343 ...
$ Category : Factor w/ 3 levels "FAMILY","GAME",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Rating : num 4.5 4.5 4.4 4.7 4.5 4.2 4.4 4.6 4.3 4.3 ...
$ Reviews : Factor w/ 2379 levels "0","1","10","100",..: 1556 1061 828 969 412
1363 1730 865 2175 68 ...
$ Size : Factor w/ 219 levels "1.0M","1.1M",..: 143 165 163 43 93 45 219

215 136 45 ...
$ Installs : Factor w/ 21 levels "0","0+","1,000,000,000+",..: 10 3 19 7 7 16

10 10 19 19 ...
$ Type : Factor w/ 3 levels "Free","NaN","Paid": 1 1 1 1 1 1 1 1 1 1 ...
$ Price : Factor w/ 38 levels "$0.99","$1.04",..: 38 38 38 38 38 38 38 38 38

38 ...
$ Content.Rating: Factor w/ 4 levels "Everyone","Everyone 10+",..: 2 2 1 1 1 1 1 2 1

1 ...
$ Genres : Factor w/ 85 levels "Action","Action;Action & Adventure",..: 4 7

23 19 23 29 1 77 1 23 ...
$ Last.Updated : Factor w/ 921 levels "April 1, 2017",..: 460 396 465 78 412 25 735
536 465 691 ...
$ Current.Ver : Factor w/ 1094 levels "0.0.1","0.0.2",..: 606 444 186 551 249 348
1093 600 347 332 ...
$ Android.Ver : Factor w/ 24 levels "1.5 and up","1.6 and up",..: 15 15 15 15 14

15 8 15 13 13 ...
You explored the data by making a barplot that shows the grouped distribution, and it came as
follow:
Cross-tabulation
table.
Content.Rating
FAMILY 1529 131 50 261
GAME 608 131 74 331

data: [HIDDEN]
There is no significant relationship between the category and the content rating of the
selected apps from Google PlayStore.
There is a significant relationship between the category and the content rating of the
selected apps from Google PlayStore.
A statistical test that conducted to determine whether there is an association between the 2
categorical variables is
The correct answer is: Test of Independence
The correct answers are: Number of students in a class, Number of students in the whole
university, Grades frequency at the end of a course, Number of children in a family
Background
You are assigned to analyze a dataset that contains the performance score measures of participants at
two-time points. The aim of this study is to evaluate the effect of gender and stress on performance
scores. Is there any difference in performance between different stress levels? If any, which one
yields the best/worst performance score?
Data Exploration
str(performance)
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ gender: Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
$ stress: Factor w/ 3 levels "low","moderate",..: 1 1 1 1 1 1 1 1 1 1 ...
$ t1 : num 5.96 5.51 5.63 5.71 5.74 ...
$ t2 : num 5.58 5.82 5.47 5.79 5.72 ...
Boundary
Since the performance is measured twice, in this problem we only focus on the first measurement (the
t1 column).
You then plotted

boxplot
a Answer

the Answer

Shapiro-Wilk Test
the Answer
stress level p-value
low 0.11428304
moderate 0.07023834
high 0.92983350
Statistical Tests
parametric method
to find is there any difference between the stress levels on the performance. Due to the nature of
One-w ay ANOVA
the problem, then you ran a Answer

stress 2 0.8235 0.4117 14.5 8.13e-06 ***
Residuals 57 1.6190 0.0284
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

to Answer
continue with further tests to find which stress level has significantly different impact on performance
.
TukeyHSD

Fit: [HIDDEN]
$stress
diff lwr upr p adj
moderate-low 0.1052102 -0.02303774 0.2334582 0.1279077
high-low -0.1786052 -0.30685319 -0.0503573 0.0040329
high-moderate -0.2838155 -0.41206340 -0.1555675 0.0000053
Conclusion
employee with moderate stress level tend to have significantly higher performance than
those with low and/or high stress levels
the experiment is a violation of human rights
there is no significant performance differences in all stress level
there is no significant performance difference between employees with moderate and low
stress levels
employees with a high stress level tend to have significantly lower performance compared
to employees with moderate and low stress levels

employess with high stress level have significantly lower performance, followed with those
with moderate, and then low stress level
Suppose that you are interested in the percentage of cellphone brands owned by the students
of UNSRAT. Therefore, on Wednesday, after class, you asked all your classmates about the
brands of their cellphones.



• It assumes your classmates represent the whole population of UNSRAT students.

• It assumes the percentage of the cellphone brands owned by the first-year students may

the following are variables in this study?
gender
cellphone brand
• cellphone brand
line plot
boxplot
time plot
pie chart
bar graph
• bar graph
• pie chart
200 data of advertising budget using YouTube and the respective sales earning were collected. You are
asked to analyze whether increasing the advertising budget would increase the sales. The
following dataset is given to you
str(marketing)
$ sales : num 26.5 12.5 11.2 22.2 15.5 ...
The youtube column shows the advertising budget spending, and the sales column shows the
earning. All these numbers are in thousands of dollar.
The first thing that you need to do

is Answer
Determine whether there is a strong relationship between the spending on YouTube advertisement service and the sales earning.
Therefore, you plotted the advertising budget and its respective sales:

Based on the value of R, you know
that Answer
there is a strong positive relationship between the spending on YouTube advertisement service and the sales earning.
and the number of books sold by using Answer

Call:
Residuals:
-10.0632 -2.3454 -0.2295 2.4805 8.6548
Coefficients:
(Intercept) 8.439112 0.549412 15.36 <2e-16 ***
youtube 0.047537 0.002691 17.67 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
From the model, you could conclude that
• For an advertising budget that equals to zero, a company may expect a sale of USD
8,440.
• For each dollar spent for advertising, a company could expect a sales earning of USD
47.537.
• the intercept 8.439 is a strong predictor of the model
Based on the model, for a company that spent USD 1,000 for YouTube advertising, the company could
expect sales earning of USD 55976.112
Qualitative data could be organized with the following ways:
Select one or more:

The correct answers are: Frequency distribution table, Tally marks, Relative frequency,
Percentage
10
The correct answer is: length()

1. 200 data of advertising budget using YouTube and the respective sales
earning were collected. You are asked to analyze whether increasing the
advertising budget would increase the sales. The following dataset is given to
you
str(marketing)
$ youtube : num 276.1 53.4 20.6 181.8 217 ...
$ sales : num 26.5 12.5 11.2 22.2 15.5 ...
The youtube column shows the advertising budget spending, and the sales
column shows the earning. All these numbers are in thousands of dollar.
The first thing that you need to do is

The correct answer : Determine whether there is a strong relationship between the
spending on Youtube advertisement service and the sales earning
After that, you ran a Pearson correlation test and the results is R=0.7822R=0.7822.
Based on the value of R, you know that
There is a strong positive relationship between the spending on Youtube
Therefore, you decided to summarize and study the relationship between the
number of students and the number of books sold by using Simple Linear
Regression The results came as follow
Call:
Residuals:
-10.0632 -2.3454 -0.2295 2.4805 8.6548
Coefficients:
(Intercept) 8.439112 0.549412 15.36 <2e-16 ***
youtube 0.047537 0.002691 17.67 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

• For each dollar spent for advertising, a company could expect a sales
earning of USD 47.537.
• the intercept 8.439 is a strong predictor of the model
• For an advertising budget that equals to zero, a company may expect a sale of USD 8,440
Based on the model, for a company that spent USD 1,000 for YouTube advertising,
the company could expect sales earning of USD 55976.112
2. You are interested in learning the favorite programming languages of the first
year Indonesian Informatics and/or Computer Science undergraduate
students. To achieve this mission, you asked your highschool classmates who
admitted to the specified program.
a. The data are collected properly and bias is minimized TRUE

b. Because a variable is a characteristic of each individual on which data is
collected, which of the following are variables that suit well with the research
question?
c. Which chart or graph would be appropriate to display the concerned
variable(s)?
• a pie chart
• a bar graph
3. You are assigned to study if there is any connection between the district
where a person lives and his/her hobby. There are 671 randomly selected
respondents that were interviewed. Their answers are collected into a data
frame with the following structure:
tr(district.hobby)
$ district: Factor w/ 4 levels "DISTRICT 1","DISTRICT 2",..:
2 1 4 4 2 2 2 3 1 4 ...
$ hobby : Factor w/ 6 levels "BASKETBALL","FOOTBALL",..: 3 3 2 6 2

3 2 3 5 2 ...
You explored the data by making a barplot that shows the grouped
distribution, and it came as follow:
To achieve the goal of the study, you create a
Cross-Tabulation
hobby
district BASKETBALL FOOTBALL PAINTING PHOTOGRAPHY SINGING

TRAVELING
DISTRICT 1 39 29 19 28 37
29
DISTRICT 2 29 33 29 30 25
32
DISTRICT 3 26 24 30 22 30
19
DISTRICT 4 28 36 23 24 26
24
With a 95% degree of freedom, you ran a
The correct answer is: Chi-square Test of Independence and the result came
as follow
data: [HIDDEN]

• Someone's hobby is independent of the district where one lives
• There is no significant relationship between the district where
someone lives with his/her hobby.
4. To gain information about the number elements in a vector, we use the

_____ function.
Select one:
• length()
5. 1,341 undergraduate students were surveyed, to gain knowledge about
the preferred teaching-and-learning method of the whole UNSRAT
students. There are three teaching-and-learning methods: online,
offline, or blended. The answers then tabulated and the frequency of
each method is presented in the report. Match the item/condition from
the example above with the right term!
Population
UNSRAT students
Statistics
Frequency
Samples
The surveyed 1,341 undergraduate students
Variable
Preferred teaching and learning method
6. Which of the following are best treated as ordinal variables?

Grades
7. A statistical test that compares or tests the suitability of observations
against expectations or its theoretical frequencies are
Goodness-of-Fit
8. Background
You are assigned to analyze a dataset that contains measures of cholesterol
concentration in 72 participants treated with three different drugs. The aim is to
examine the potential of a new class of drugs in lowering cholesterol concentration
and consequently reducing heart attack. The participants include 36 males and 36
females. Males and females were further (equally) subdivided into whether they
were at low or high risk of a heart attack. Is there any difference in the impact of
each drug on cholesterol concentration? If any, which one has the highest
impact, in terms of the lowest cholesterol concentration?
Data Exploration
str(heartattack)
$ gender : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1

1 ...
$ risk : Factor w/ 2 levels "high","low": 2 2 2 2 2 2 2 2 2 2

...
$ drug : Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2

...
$ cholesterol: num 5.24 5.08 4.68 5.36 4.96 ...
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
You then plotted a boxplot and the result came as follow

Since you were comparing a variable in three different groups, then you need the
proper method. Therefore, you need to decide which method to use. So you start
with checking the distribution normality of the cholesterol concentration on each
group by using the Shapiro-wilk Test The p-value for each tested group is shown in
the following table:
drug p-value
A 0.1537620
B 0.7674545
C 0.5537145
Statistical Tests
Based on it, then you decide to use parametric method to find is there any
difference between the drugs used toward the cholesterol concentration. Due to the
nature of the problem, then you ran a One-way ANOVA and the result is as follow:
drug 2 1.235 0.6177 2.63 0.0793 .
Residuals 69 16.204 0.2348
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Based on the result, then you decide to

Draw a final conclusion
Therefore you ran a Answer [FURTHER TEST IS UNNECESSARY]
and the results came as follow:
Fit: [HIDDEN]
$drug
diff lwr upr p adj
B-A -0.277327333 -0.6124096 0.05775494 0.1241979
C-A -0.278421280 -0.6135035 0.05666099 0.1222405
C-B -0.001093947 -0.3361762 0.33398832 0.9999663
Conclusion
9. The following table contains a subset of the results from a survey about how the
first year UNSRAT undergraduate students access e-Learning.
STU003 law NA
Observation personal notebook/PC

Element STU001
10. Quantitative data visualization which separates the first digit and
the the other digits is
Select one:
• Stem-and-leaf display
Soal 1
Separuh benar
Poin 0,67 dari 1,00
Tandai pertanyaan
Teks soal
Background
You are assigned to analyze a dataset that contains measures of cholesterol concentration in 72 participants
treated with three different drugs. The aim is to examine the potential of a new class of drugs in lowering
cholesterol concentration and consequently reducing heart attack. The participants include 36 males and 36
females. Males and females were further (equally) subdivided into whether they were at low or high risk of
a heart attack. Is there any difference in the impact of each drug on cholesterol concentration? If any,
which one has the highest impact, in terms of the lowest cholesterol concentration?
Data Exploration
str(heartattack)
$ cholesterol: num 5.24 5.08 4.68 5.36 4.96 ...
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
boxplot
Therefore, you need to decide which method to use. So you start with checking the Jawaban
of the cholesterol concentration on each group by using the Jawaban
Shapiro-Wilk Test
drug p-value
A 0.1537620
B 0.7674545
C 0.5537145
Statistical Tests
parametric method
to find is there any difference between the drugs used toward the cholesterol concentration. Due
to the nature of the problem, then you ran a Jawaban
One-way ANOVA
drug 2 1.235 0.6177 2.63 0.0793 .
Residuals 69 16.204 0.2348
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Based on the result, then you decide to Jawaban
draw a final conclusion
[FURTHER TEST IS UNNECESSARY]

Fit: [HIDDEN]
$drug
diff lwr upr p adj
B-A -0.277327333 -0.6124096 0.05775494 0.1241979
C-A -0.278421280 -0.6135035 0.05666099 0.1222405
C-B -0.001093947 -0.3361762 0.33398832 0.9999663
Conclusion

Poin -15,00 dari 5,00
(Credit: dataset used in this vignettes is based on the heartattack dataset in the datarium package)
Soal 2
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
The following table contains a subset of the results from a survey about the achievement of the present
second-year students in your program in their first semester.
questionnaire_code 1st_GP
STU001 3.92
STU002 3.88
STU003 3.2
STU004 2.78
Element STU001
Answer 1
Variable 1st_GP
Answer 2
Observation 3.92
Answer 3
Umpan balik
The correct answer is: Element → STU001, Variable → 1st_GP, Observation → 3.92
Soal 3
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
bookstores are asking for more programming books to be stocked for each of them. Since the stock in the
company's warehouse is limited, hence the management will decide the allocation based on historical sales
data. Therefore, the management asked you to make the analysis. The data that they possess contains
historical data of the number of students who took programming courses and the number of programming
books sold at the respective university bookstore. Should the university with more students who took a
programming course to be allocated more books?

str(student.books)
$ nstudents : int 204 179 200 177 207 195 166 178 213 130 ...
$ books_sold: int 441 329 467 376 504 396 354 439 461 235 ...
number of books sold at a university bookstore with the respective number of students who took
the programming course.
The first thing that you need to do is Jawaban Determine whether there is a strong relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore.


Based on the value of R, you know that Jawaban There is a strong positive relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore
and the number of books sold by using Jawaban
Call:
Residuals:
-80.265 -37.203 -2.531 38.198 83.988
Coefficients:
(Intercept) 1.1796 23.6051 0.05 0.96
nstudents 2.1165 0.1166 18.15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

however since the factor itself is not significant, then the bookstore should not clinging onto that.
Poin 3,00 dari 3,00
that.
• For each student taking the programming class, the respective university bookstore could
.
Based on the model, for a university with 100 students taking a programming course, the publisher could
212.82
expect the respective bookstore would sell Jawaban
Soal 4
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
You are assigned to study if there is any connection between the district where a person lives and his/her
hobby. There are 671 randomly selected respondents that were interviewed. Their answers are collected
into a data frame with the following structure:
str(district.hobby)
$ district: Factor w/ 4 levels "DISTRICT 1","DISTRICT 2",..: 2 1 4 4 2 2 2 3 1 4

...
$ hobby : Factor w/ 6 levels "BASKETBALL","FOOTBALL",..: 3 3 2 6 2 3 2 3 5 2 ..

.
Cross-tabulation
table.
hobby
district BASKETBALL FOOTBALL PAINTING PHOTOGRAPHY SINGING TRAVELING
DISTRICT 1 39 29 19 28 37 29
DISTRICT 2 29 33 29 30 25 32
DISTRICT 3 26 24 30 22 30 19
DISTRICT 4 28 36 23 24 26 24

data: [HIDDEN]
hobby.
hobby.
Some hobbies are significantly preferred in certain districts.
Someone's hobby is independent of the district where one lives.

Poin 2,00 dari 2,00
• There is no significant relationship between the district where someone lives with his/her
hobby.
• Someone's hobby is independent of the district where one lives.
Soal 5
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
Methods that can be used to find out whether the data is normally distributed or not are
a. Applying the Kolmogorov-Smirnov test
b. Applying the Kruskal-Wallis test
c. Applying the Wilcoxon Rank Sum test
d. Observing pie chart and bar plot
e. Applying the Shapiro-Wilk test
f. Observing histogram and density plot

Umpan balik
Pilihan-pilihan terbaik adalah: Observing histogram and density plot, Applying the Shapiro-Wilk
test, Applying the Kolmogorov-Smirnov test
Soal 6
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
You are interested in knowing the achievement of the present second-year students in your
program at their first semester. It is measured according to the GP achieved. You then collected
the 1st semester GP of 31 randomly selected second-year students and calculate the mean.
Population Second-year students

Answer 1
Parameter GP
Answer 2
Samples 31 randomly selected second-year students

Answer 3
Statistics Average
Answer 4
Umpan balik
The correct answer is: Population → Second-year students, Parameter → GP, Samples → 31
randomly selected second-year students, Statistics → Average
Soal 7
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
Qualitative data could be visualized with the following ways:
a. Raw data
b. Bar plot
c. Pie chart
d. Boxplot
e. Percentage
f. Relative frequency
g. Frequency distribution table
h. Tally marks
Umpan balik
Pilihan-pilihan terbaik adalah: Bar plot, Pie chart
Soal 8
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
Suppose that you are interested in the percentage of cellphone brands owned by the students of
UNSRAT. Therefore, on Wednesday, after class, you asked all your classmates about the brands
of their cellphones.
Poin 1,00 dari 1,00

• It assumes your classmates represent the whole population of UNSRAT students.
• It assumes the percentage of the cellphone brands owned by the first-year students may

the following are variables in this study?
gender
cellphone brand

Poin 1,00 dari 1,00
• cellphone brand
pie chart
time plot
boxplot
line plot
bar graph
Poin 1,00 dari 1,00
• bar graph
• pie chart
Soal 9
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
The correct command(s) to create a sequence of number in R is(are)?
a. seq(1, 10)
b. seq(10, 1)
c. 1:10
d. seq(10)
e. seq(10, 1, 1)
f. seq(1, 10, -1)
g. seq(10, 1, -1)
Umpan balik
Pilihan-pilihan terbaik adalah: 1:10, seq(1, 10), seq(10, 1), seq(10, 1, -1), seq(10)
Soal 10
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
a. Number of students who achieve pass grades

b. The distance between two cities
c. Height
d. Number of classes in a college building

Umpan balik
Pilihan-pilihan terbaik adalah: Number of classes in a college building, Number of students who
achieve pass grades

All Merged Revised

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

All Merged Revised

Uploaded by

Copyright:

Available Formats

1. 200 data of advertising budget using YouTube and the respective sales earning were collected.

$ youtube : num 276.1 53.4 20.6 181.8 217 ...

$ sales : num 26.5 12.5 11.2 22.2 15.5 ...

The first thing that you need to do

Pearson correlation test

Based on the value of R, you know

lm(formula = sales ~ youtube, data = marketing)

Min 1Q Median 3Q Max

-10.0632 -2.3454 -0.2295 2.4805 8.6548

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.439112 0.549412 15.36 <2e-16 ***

youtube 0.047537 0.002691 17.67 <2e-16 ***

Residual standard error: 3.91 on 198 degrees of freedom

Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099

F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16

The budget spent for advertising is not a significant predictor of sales.

the intercept 8.439 is a strong predictor of the model

2. Which function(s) that is(are) used to store a workspace in R?

Select one or more:

b. Wilcoxon Rank Sum Test

number of students who chose particular programming as their favorite one

chosen programming language

the respondent's final score in algorithm and programming course

c. Which chart or graph would be appropriate to display the concerned variable(s)?

Which one of the following that is best treated as ordinal variables?

Population UNSRAT students

Samples the surveyed 1,341 undergraduate students

Variable preferred teaching-and-learning method

What can be learned from a histogram and/or a stem-and-leaf display?

Select one or more:

Platform Action Adventure Fighting Misc Racing Role-Playing Shooter Simulation

DS 343 240 36 393 67 200 42 285

PC 165 65 6 24 60 104 148 115

PS2 348 196 150 222 216 187 160 90

PS3 380 74 76 124 92 119 156 31

PSP 222 213 74 106 65 192 37 29

Wii 238 84 42 280 94 35 66 87

X360 324 47 65 126 105 76 203 40

XB 155 26 48 46 123 23 132 24

Platform Sports Strategy

Chi-square Test of Independence

X-squared = 2873.1, df = 81, p-value < 2.2e-16

There is no significant relationship between genres and the platform used.

questionnaire_code program access_mean

STU001 informatics personal notebook/PC

STU002 civil shared notebook/PC

STU004 medical personal tablet

Observation Answer 2 personal notebook/PC

$ score : num 0 23.8 26 20.4 12.1 ...

$ test : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...

data: score and test and student

[HIDDEN] chi-squared = 30.778, df = 2, p-value = 2.073e-07

[NAME of TEST HIDDEN]

data: score and test

P value adjustment method: bonferroni

There is no significantly different score achievements in all three quizzes

significantly lower than the other two tests.

Samples 500 randomly selected first year UNSRAT undergraduate students

Parameter The means used to access e-Learning

$ gender: Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...

$ stress: Factor w/ 3 levels "low","moderate",..: 1 1 1 1 1 1 1 1 1 1 ...

$ t1 : num 5.96 5.51 5.63 5.71 5.74 ...

$ t2 : num 5.58 5.82 5.47 5.79 5.72 ...