You are on page 1of 11

1

Individual Contributions

Aadit Aggarwal – Hypothesis Testing for Internet Speed, Histogram for Internet Speed,
Stack Bar Graph for Internet Speed and Age

Khushbu Dawar - Hypothesis Testing for Age, Histogram for Age

Lokesh - Hypothesis Testing for number of family members, Descriptive Statistics

Madhurakshi Pramanik - Hypothesis Testing for power cut hours, Stack Bar Graph for
family members and power cut hours

Prajwal Udupa V - Hypothesis Testing for Gender, Histogram for power cut hours, Stack
Bar Graph for Gender and Vaccination Status

Rajat - Hypothesis Testing for Vaccination Status, Histogram for family members

Everybody worked together to create the google form survey and the final report.

2
CONTENTS

1. INTRODUCTION 4
2. OBJECTIVE OF THE PROJECT 4
3. STUDY METHOD 4
4. DATA COLLECTION 5
5. DATA EXPLORATION 5
5.1. Descriptive Statistics 5
5.2. Histograms 6
5.3. Stacked Bar Graphs 6
6. HYPOTHESIS TESTING 7
6.1 Impact of Internet Speed on Students’ Decision 7
6.2 Impact of Age on Students’ Decision 8
6.3 Impact of Total Family Members on Students’ Decision 9
6.4 Impact of Number of Power Cuts in a day on Students’ Decision 9
6.5 Impact of Gender on Students’ Decision 10
6.6 Impact of Vaccination Status on Students’ Decision 10
7. INFERENCES 11
8. LIMITATIONS AND FUTURE POTENTIAL OF THE PROJECT 11
9. REFERENCES 11
10. APPENDIX 11

3
1. INTRODUCTION
Data Analysis is the process of systematically applying statistical and/or logical techniques to describe
and illustrate, condense and recap, and evaluate data.

Given the current Covid situation, many students of PGP25 batch at IIM-K are still trying to make up their
mind on whether or not they should not come to the campus. The students consider various factors
before finally making the decision to go or not to the campus. Among these, we feel that there are six
factors which are important in making this decision. Through this project, we want to collect data and
use the data analysis tools at our disposal to see which of these six factors are actually critical to making
the final decision to come to campus.

2. OBJECTIVE OF THE PROJECT

In this project, we aim to collect data from a sample of the total population of PGP25 batch and use this
sample data to determine if each of the below listed six variables (4 quantitative and 2 qualitative
variables) have a statistically significant impact on the students’ decision to come to campus.

1. Internet Speed in the student’s house


2. Age of the Student
3. Number of Family Members in the student’s house
4. Number of Power Cut Hours in a day in the student’s house
5. Gender of the Student
6. Vaccination Status of the Student

3. STUDY METHOD

The methodology we used for this project consisted of:

Data Collection: We sourced our data by circulating a Google form among the PGP25 batch to get
information about their decision on whether or not they would come to the campus, and how the
above mentioned six factors affected this decision.

Data Exploration: Using the data collected, we looked at the descriptive statistics, set up histograms
and bar graphs to better understand the trends and distribution within our data. The insights gained
from this exercise informed us in setting up the appropriate hypothesis.

Hypothesis Testing: This step involved designing the hypothesis for each variable by defining the
appropriate statistic, filtering out the data on which the tests would b run and finally running the
tests to check the hypothesis.

4
4. DATA COLLECTION
With the aim of understanding the impact of the above mentioned six parameters on the decision of
students to campus, we conducted a survey by circulated a Google form among the PGP25 batch, with
following 7 questions (1 question for the decision variable and 6 questions with one each for each
predictor variable):

1. Do you want to go to Kampus? [Response Allowed: Yes / No]


2. How Fast is your Internet at home? (in mbps) [Response Allowed: Whole Numbers]
3. What is your Gender? [Response Allowed: Male / Female / Others]
4. How old are you today in years? [Response Allowed: Whole Numbers]
5. What is your vaccination status? [Response Allowed: Fully Vaccinated / Partially Vaccinated / Not
Vaccinated]
6. How many people live with you in the house? [Response Allowed: Whole Numbers]
7. Number of power cuts hours in a day? [Response Allowed: Whole Numbers]

We obtained 100 responses to this survey and theses responses served as the dataset for the analysis
carried out in the project.

Note: As at least half PGP24 and the PhD students have already come to campus, we decided to limit the
survey to PGP25 batch

5. DATA EXPLORATION

To understand how our quantitative variables are distributed, we looked at descriptive statistics and the
histogram for these variables.

5.1. Descriptive Statistics

How Fast is your How old are How many people Number of
Descriptive
Internet at you today in live with you in power cuts
Statistics
home? (in mbps) years? the house? hours in a day?
Mean 36.99 23.21 3.99 1.91
Standard Error 3.00 0.16 0.13 0.23
Median 27.5 23 4 1
Mode 10 23 4 0
Standard Deviation 30.04 1.61 1.33 2.33
Sample Variance 902.66 2.59 1.77 5.42
Kurtosis 2.38 -0.05 0.28 2.74
Skewness 1.37 0.44 0.41 1.72
Range 148 8 7 10
Minimum 2 20 1 0
Maximum 150 28 8 10
Sum 3699 2321 399 191
Count 100 100 100 100

5
5.2. Histograms

After this, we looked at stacked bar graphs to understand how each of the six variables is distributed
differently for students who want to come to campus and those who don’t.

5.3. Stacked Bar Graphs

6
6. HYPOTHESIS TESTING

We will be carrying out all our hypothesis testing at 5% level of significance.

6.1 Impact of Internet Speed on Students’ Decision

Null Hypothesis: Mean internet speed of students who want to come to campus >= Mean internet speed
of students who do not want to come to campus.

Alternate Hypothesis: Mean internet speed of students who want to come to campus < Mean internet
speed of students who do not want to come to campus.

7
t-Test: Two-Sample Assuming Unequal Variances

Decision: Since the p-value of the t-test is less than 0.05, we can reject the null hypothesis.
Conclusion: The mean internet speed for people who want to come to campus is significantly more than
the mean internet speed for people who do want to come to campus. This factor can further be
evaluated for how much impact it has over the decision.

6.2 Impact of Age on Students’ Decision

Null Hypothesis: Mean age of students who want to come to campus >= Mean age of students who do
not want to come to campus.

Alternate Hypothesis: Mean age of students who want to come to campus < Mean age of students who
do not want to come to campus

t-Test: Two-Sample Assuming Unequal Variances

Decision: Since the p-value of the t-test is greater than 0.05, we fail to reject the null hypothesis.
Conclusion: The mean age of people who want to come to campus is not significantly more than the
mean age for people who do want to come to campus.

8
6.3 Impact of Total Family Members on Students’ Decision

Null Hypothesis: Mean number of family members at home for students who want to come to campus <=
Mean number of family members at home for students who do not want to come to campus.

Alternate Hypothesis: Mean number of family members at home for students who want to come to
campus > Mean number of family members at home for students who do not want to come to campus.

t-Test: Two-Sample Assuming Unequal Variances

Decision: Since the p-value of the t-test is less than 0.05, we can reject the null hypothesis.
Conclusion: The mean number of family members for people who want to come to campus is
significantly more than the mean number of family members for people who do want to come to
campus. This factor can further be evaluated for how much impact it has over the decision.

6.4 Impact of Number of Power Cuts in a day on Students’ Decision

Null Hypothesis: Mean number of power cut hours in a day for students who want to come to campus <=
Mean no. of power cut hours in a day for students who do not want to come to campus.

Alternate Hypothesis: Mean number of power cut hours in a day for students who want to come to
campus > Mean no. of power cut hours in a day for students who do not want to come to campus.

t-Test: Two-Sample Assuming Unequal Variances

9
Decision: Since the p-value of the t-test is less than 0.05, we can reject the null hypothesis.
Conclusion: The mean number of power-cut hours for people who want to come to campus is
significantly more than the mean number of power-cut hours for people who do want to come to
campus. This factor can further be evaluated for how much impact it has over the decision.

6.5 Impact of Gender on Students’ Decision

Null Hypothesis: The proportion of males who want to come to campus (p1) = The proportion of females
who want to come to campus (p2).

Alternate Hypothesis: The proportion of males who want to come to campus (p1) ≠ The proportion of
females who want to come to campus (p2).

z-Test
p1 0.545455
p2 0.488889
p 0.52
z statistic 0.563272
p value 0.57325

Decision: Since the p-value of the t-test is greater than 0.05, we fail to reject the null hypothesis.
Conclusion: The mean proportion of males who want to come to campus does not significantly differ
from the proportion of females who want to come to campus.

Note: As np>5 in our case, we can safely assume that our sample has come from normal distribution.

6.6 Impact of Vaccination Status on Students’ Decision

Null Hypothesis: The proportion of vaccinated people who want to come to campus (p1) <= The
proportion of partially/non-vaccinated people who want to come to campus (p2).

Alternate Hypothesis: The proportion of vaccinated people who want to come to campus (p1) > The
proportion of partially/non-vaccinated people who want to come to campus (p2).

z-Test
p1 0.5774648
p2 0.3793103
p 0.52
z statistic 1.7997397
p value 0.0359509

Decision: Since the p-value of the t-test is less than 0.05, we can reject the null hypothesis.
Conclusion: The proportion of vaccinated people who want to come to campus is significantly more than
the proportion of non-vaccinated and partially vaccinated people who want to come to the campus.

Note: As np>5 in our case, we can safely assume that our sample has come from normal distribution.

10
7. INFERENCES

1. Three quantitative factors - Internet Speed in the student’s house speed, number of family
members in the student’s house and number of power cut hours in a day - in the student’s house
have a significant statistical impact on students’ decision to go to campus.

2. One qualitative factor - vaccination status of the student has a significant statistical impact on
students’ decision to go to campus.

8. LIMITATIONS AND FUTURE POTENTIAL OF THE PROJECT

1. Since the analysis is based on the primary data collected from the survey, the results are as good
as the quality of responses. Some of the respondents might not have exercised reasonable
judgement while answering the questions, and hence our analysis will be impacted.

2. We have considered limited variables in our analysis. We could have included more variables to
better predict the behavior of the students wanting to come to campus.

3. We have not measured the actual impact that the variable might have over the decision to come
to campus. Each factor will have a different impact over the decision, which we haven't measured
in our analysis.

4. Our sample size is 100. This is approximately 18% of the total batch size of PGP25. A larger sample
size would give better results with much more reliability.

5. Only the PGP-25 students are under the purview of our analysis. We haven't surveyed the PGP24
students. We could have included them and analyzed the difference between the 2 groups of
students.

9. REFERENCES

1. Business Statistics by Jaggia and Kelly


2. PPT slides by professor Shovan Chowdhury
3. Analysis Tool Pack for Excel

10. APPENDIX
Excel Data file with analysis:

• https://docs.google.com/spreadsheets/d/1x-
lqhEVkereaHJBTH6RGTaxEADJyxQKw/edit?usp=sharing&ouid=114372488074375833264&rtpof=true&sd
=true

11

You might also like