Professional Documents
Culture Documents
High Low
Female 50 50
High Low
High Low
Male 25 35
Female 20 20
ii Percentage of employees who are females and have low stress level=40/200 2 marks
2b Constructing Frequency contingency table characterized by gender and stress level using COUNTIFS Excel
function: 3 marks
COUNTIFS(B2:B200, “Male”, C2:C200, “High”)
COUNTIFS(B2:B200, “Female”, C2:C200, “High”)
COUNTIFS(B2:B200, “Male”, C2:C200, “Low”)
COUNTIFS(B2:B200, “Female”, C2:C200, “Low”)
3 marks for any 3 of the above syntax
2c Command to import the Excel data file into R: data = read.csv(filename.csv) 1 mark
Contingency table command: table(data$gender, data$stress) 2 marks
Marks should not be deducted if the student merely mentions table(gender, stress)
3a i Mean and standard deviation of 3 types of balls: (1 mark for Mean +2 marks for S.D.)*3= 9 marks
If the last value for red balls was 35 instead of 50, mean falls to 36.3 1 mark
ii Column bar plot is best to represent the mean of the diameters of the three types of balls most efficiently.
Reason: Scatter plots are used for showing the relationship between two variables and line plot is used to
show the trend over time. Column plot is the only option applicable, where the three Columns can be used
to compare values across categories. 3 marks
UPC : 12273303 Data Analysis (SEC) - Page 2 of 2 Semester : III
i The mean of 19.19 is less than the median of 14.47. (In a normal distribution, the mean and median are
equal.)
ii The interquartile range of 14.2 is approximately 1.18 standard deviations. (In a normal distribution, the
interquartile range is 1.33 standard deviations.)
iii The range of 42.97 is equal to 3.57 standard deviations. (In a normal distribution, the range is approximately
6 standard deviations.)
Hence, not normally distributed.
4b Explanation of measure of skewness and kurtosis of a distribution. (Levine et al. (2017), P-132) 4 marks
Excel functions: SKEW, KURT (1*2)=2 marks
4c R commands:
i 4X4 matrix A using sequence of numbers from 1 to 16.
A = matrix(seq(1,16), nrow = 4, ncol = 4) 2.5 marks
ii Matrix B, which is transpose of matrix A.
B=t(A) 1 mark
iii Matrix C, which is obtained by multiplication of matrix A with B:
C=A % * % B 1 mark
Marks should be deducted if a student simply writes A*B
5a 95 % CI for population mean= [0.905, 0.948] 4 marks
1 litre does not lie in the 95 % CI and hence, the distributor has the right to complaint. 2 marks
5b (b) Sampling error explanation. (Levine et al. (2017), P-265) 2 marks
When N=900, the sampling error is 0.02156.
When N=1089, the sampling error becomes 0.0196. Hence, the sampling error falls. 2 marks
Excel function to calculate sampling error= CONFIDENCE() 2 marks
5c R command(s) for constructing a neatly labelled and colourful histogram, with unequal bins. 4.5 marks
hist(marks, breaks = c(0,33,50,60,75,100), col = “tomato”, main = “Number of students scoring marks”, xlab
= “Marks”, ylab = “Number of Students”)
Key options that answer must contain: hist(), breaks(), main(), xlab, ylab, col
5c For Visually Impaired students: Explanation of the use of the following R commands: getwd() and setwd().
(Garderner, P- 35) 4.5 marks
6a Let x1 and x2 be the mean revenue earned from City A and City B, respectively. Then hypotheses are: 2 marks
H0 : x1 − x2 ≤= 0
H1 : x1 − x2 > 0
6b Following are the hypotheses to test the difference in the mean revenue:
H0 : x1 − x2 = 0; H1 : x1 − x2! = 0 2 marks
t-stat = 1.76; t critical (at 5 % level) = 2.05.
Since t-stat is less than the t-critical, we do not reject Ho. Hence, no evidence of a difference in revenue earned
in the two cities. Therefore, it is not justified for the firm to focus on one city. 2 marks
6c The p-value for the two-tail test is 0.09.
0.01 < 0.09 < 0.10. Thus, do not reject Ho at 0.01, while reject Ho at 0.1 level of significance. 2 marks
6d Suppose CityArevenue, CityBrevenue are the variable names.
t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”) 2.5 marks
t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”, conf.level =.99) 1 mark
t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”, conf.level =.90) 1 mark
Key options to check: t.test(), var.equal, alternative, conf.level
6e Excel functions used for getting the Student’s-t distribution: T.DIST.2T/T.DIST/TDIST 2 marks
Excel functions used for getting the inverse of Student’s-t distribution: T.INV.2T/T.INV/TINV 2 marks
(Levine et al. (2017), P -329).
Full credit to be given for explanation of any one of the above mentioned functions for student’s t-distribution,
and any one for inverse of t-distribution.