Business Analytics – September 2023 – Assignment-1
NOTE-1: ALL COMPUTATIONS FOR QUESTION NUMBERS 1 to 12 NEED TO BE
OBTAINED FROM GOOGLE COLAB ONLY. Answers from other software (like Excel,
PyCharm, Jupyter Notebook, etc.) may differ from the provided answers. Hence, such
different answers will not be awarded marks, given the assignment is auto-graded.
NOTE-2: For ALL QUESTIONS in this assignment, input your answers rounded to 4 decimal
places. For example, if your answer is “1.234567” input the answer as “1.2346”
A car maker is exploring the distribution of defects in the assembly line. Data on defective
cars produced (a sample of 100 defective cars) on the assembly line are presented in the file
titled (GOF_24t2.xlsx). The file contains one column - obs. The obs indicates the number of
scratches found in a produced car.
You are hired as the business analyst to assist the car maker in answering the following
questions (Question Numbers 1 to 12).
1. Based on the given sample data, the column “obs” can be assumed to have a __ [1 Mark]
a) Discrete distribution
b) Continuous distribution
2. Given the sample data, the expected number of scratches on a car produced on the
assembly line is ___ [1 Mark] [Ans: 3.4 to 3.5]
3. Based on the descriptive statistics (and visualization) of the sample data, the total scratch
length can be assumed to have a ___ [1 Mark]
a) Symmetric distribution as the mean > median > mode
b) Symmetric distribution as the mode > median > mean
c) Right skewed distribution as the mean > median > mode
d) Left skewed distribution as the mode > median > mean
e) Left skewed distribution as the mean > median > mode
f) Right skewed distribution as the mode > median > mean
g) None of the above
4. If you perform a chi-square goodness of fit test for the “obs” column, what is the
probability of observing exactly 5 defects? [2 Marks] [Ans: 0.125 to 0.135]
[Hint-1: Use the descriptive statistics to identify a possible distribution]
[Hint-2: Determine the chi-square test statistic without adding any new bins. Modifying
an existing bin is allowed]
5. If you perform a chi-square goodness of fit test for the “obs” column, what is the value of
the computed test statistic? [2 Marks] [Ans: 6.6 to 6.75]
[Hint-1: Use the descriptive statistics to identify a possible distribution]
[Hint-2: Determine the chi-square test statistic without adding any new bins. Modifying
an existing bin is allowed]
Business Analytics – September 2023 – Assignment-1
6. If you perform a chi-square goodness of fit test for the “obs” column, what is the p-value
for the test? [2 Marks] [Ans: 0.55 to 0.59]
[Hint-1: Use the descriptive statistics to identify a possible distribution]
[Hint-2: Determine the chi-square test statistic without adding any new bins. Modifying
an existing bin is allowed]
7. How many degrees of freedom are present in the chi-square goodness of test to determine
the distribution for the “Number of Scratches” column? [2 Marks] [Ans: 7]
You are given the following dataset – chiSquareAssignment.csv. Perform the chi-squared
test of independence on the given dataset and answer the following questions.
1. What is the degree of freedom? [1.5 marks]
a) 2
b) 3
c) 4
d) 5
2. What is the calculated value of chi-squared? [1.5 marks]
Accept 35 to 36
3. What is the p-value? [1.5 marks]
a) 8.262639420013194e-08
b) 8.262639420013194e-09
c) 8.262639420013194e-07
d) None of the above
4. From the contingency table, calculate the probability that a randomly observed person
prefers the brand Don? [1.5 marks]
Accept 0.14 – 0.18
5. What is the expected frequency of people in the Student group preferring the brand Don?
[1.5 marks]
Accept 63 – 64
6. At a significance level of 0.01, what can we conclude? [1.5 marks]
a) Reject the null hypothesis and conclude that the categorical variables are
independent
b) Reject the null hypothesis and conclude that the categorical variables are not
independent
c) Fail to reject the null hypothesis and conclude that the categorical variables
are independent
d) Fail to reject the null hypothesis and conclude that the categorical variables
are not independent