You are on page 1of 2

Set A Marking Scheme

Name of the Course : B. A. (H) Economics


Semester : III
Name of the Paper : Data Analysis (SEC)
Unique Paper Code : 12273303
Duration: 3 hours Maximum Marks:65

1. a Population of interest: 4000 full-time students of the engineering college. 2 marks


Probability Sampling methods: Random Sampling, Systematic Random Sampling, Stratified Sampling,
Cluster Sampling (Levine et al. (2017), Section 1.3) 4*2=8 marks
b Both RAND and RANDBETWEEN functions are used to create random numbers in Excel. The RAND
function returns a random number between 0 and 1 whereas RANDBETWEEN returns a random in-
teger number between any two specified numbers. For example, =RAND() will generate a number like
0.422245717 while RANDBETWEEN(1,100) will create a random integer number between 1 and 100.
Both the functions return a new random number every time the worksheet is calculated. 2 marks
Simple random sample of size 200 with replacement: RANDBETWEEN(1,4000) 1 mark
c R commands: 1*3=3 marks
i Round off 22/7 to the nearest 3 digits after decimal: round(22/7, digits = 3)
ii Round off 18/7 to the greatest integer: ceiling(18/7)
iii Round off 17/5 to the least integer: floor(17/5)

2a i Contingency table based on percentage of row total: 2 marks

High Low

Male 41.67 58.33

Female 50 50

Contingency Table based on percentage of column total: 2 marks

High Low

Male 55.55 63.64

Female 44.44 36.36

Contingency Table based on percentage of overall total: 2 marks

High Low

Male 25 35

Female 20 20

Males are at greater risk of high stress. 2 marks

ii Percentage of employees who are females and have low stress level=40/200 2 marks

2b Constructing Frequency contingency table characterized by gender and stress level using COUNTIFS Excel
function: 3 marks
COUNTIFS(B2:B200, “Male”, C2:C200, “High”)
COUNTIFS(B2:B200, “Female”, C2:C200, “High”)
COUNTIFS(B2:B200, “Male”, C2:C200, “Low”)
COUNTIFS(B2:B200, “Female”, C2:C200, “Low”)
3 marks for any 3 of the above syntax
2c Command to import the Excel data file into R: data = read.csv(filename.csv) 1 mark
Contingency table command: table(data$gender, data$stress) 2 marks
Marks should not be deducted if the student merely mentions table(gender, stress)

3a i Mean and standard deviation of 3 types of balls: (1 mark for Mean +2 marks for S.D.)*3= 9 marks

Red Blue Green

Mean 37.8 35.8 35.2

Standard Deviation 6.142 6.285 3.425

If the last value for red balls was 35 instead of 50, mean falls to 36.3 1 mark
ii Column bar plot is best to represent the mean of the diameters of the three types of balls most efficiently.
Reason: Scatter plots are used for showing the relationship between two variables and line plot is used to
show the trend over time. Column plot is the only option applicable, where the three Columns can be used
to compare values across categories. 3 marks
UPC : 12273303 Data Analysis (SEC) - Page 2 of 2 Semester : III

3b R commands: (1.5*2)=3 marks


i bag = rep(c(“Red”,“Green”,“Blue”),times=c(10,10,10))
ii urnsamples(bag, size = 5, replace = FALSE) OR urnsamples(bag, size = 5)
Either of the two commands.
3 a(ii) In lieu of Q3, Part (a)(ii), only for Visually impaired students: Difference between discrete and conti-
nuous numerical variables: Levine et al. (2017), P-38 3 marks
4a Comparing data characteristics to theoretical properties: (Levine et al. (2017), P 228-229) (2*3)=6 marks

i The mean of 19.19 is less than the median of 14.47. (In a normal distribution, the mean and median are
equal.)
ii The interquartile range of 14.2 is approximately 1.18 standard deviations. (In a normal distribution, the
interquartile range is 1.33 standard deviations.)
iii The range of 42.97 is equal to 3.57 standard deviations. (In a normal distribution, the range is approximately
6 standard deviations.)
Hence, not normally distributed.
4b Explanation of measure of skewness and kurtosis of a distribution. (Levine et al. (2017), P-132) 4 marks
Excel functions: SKEW, KURT (1*2)=2 marks
4c R commands:
i 4X4 matrix A using sequence of numbers from 1 to 16.
A = matrix(seq(1,16), nrow = 4, ncol = 4) 2.5 marks
ii Matrix B, which is transpose of matrix A.
B=t(A) 1 mark
iii Matrix C, which is obtained by multiplication of matrix A with B:
C=A % * % B 1 mark
Marks should be deducted if a student simply writes A*B
5a 95 % CI for population mean= [0.905, 0.948] 4 marks
1 litre does not lie in the 95 % CI and hence, the distributor has the right to complaint. 2 marks
5b (b) Sampling error explanation. (Levine et al. (2017), P-265) 2 marks
When N=900, the sampling error is 0.02156.
When N=1089, the sampling error becomes 0.0196. Hence, the sampling error falls. 2 marks
Excel function to calculate sampling error= CONFIDENCE() 2 marks
5c R command(s) for constructing a neatly labelled and colourful histogram, with unequal bins. 4.5 marks
hist(marks, breaks = c(0,33,50,60,75,100), col = “tomato”, main = “Number of students scoring marks”, xlab
= “Marks”, ylab = “Number of Students”)
Key options that answer must contain: hist(), breaks(), main(), xlab, ylab, col
5c For Visually Impaired students: Explanation of the use of the following R commands: getwd() and setwd().
(Garderner, P- 35) 4.5 marks
6a Let x1 and x2 be the mean revenue earned from City A and City B, respectively. Then hypotheses are: 2 marks
H0 : x1 − x2 ≤= 0
H1 : x1 − x2 > 0
6b Following are the hypotheses to test the difference in the mean revenue:
H0 : x1 − x2 = 0; H1 : x1 − x2! = 0 2 marks
t-stat = 1.76; t critical (at 5 % level) = 2.05.
Since t-stat is less than the t-critical, we do not reject Ho. Hence, no evidence of a difference in revenue earned
in the two cities. Therefore, it is not justified for the firm to focus on one city. 2 marks
6c The p-value for the two-tail test is 0.09.
0.01 < 0.09 < 0.10. Thus, do not reject Ho at 0.01, while reject Ho at 0.1 level of significance. 2 marks
6d Suppose CityArevenue, CityBrevenue are the variable names.
t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”) 2.5 marks
t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”, conf.level =.99) 1 mark
t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”, conf.level =.90) 1 mark
Key options to check: t.test(), var.equal, alternative, conf.level

6e Excel functions used for getting the Student’s-t distribution: T.DIST.2T/T.DIST/TDIST 2 marks
Excel functions used for getting the inverse of Student’s-t distribution: T.INV.2T/T.INV/TINV 2 marks
(Levine et al. (2017), P -329).
Full credit to be given for explanation of any one of the above mentioned functions for student’s t-distribution,
and any one for inverse of t-distribution.

You might also like