You are on page 1of 55

RESEARCH METHODOLOGY

(PRACTICAL FILE)
(BCom(H) 308 )

INSTITUTE OF INFORMATION TECHNOLOGY AND MANAGEMENT


NEW DELHI 110058
BATCH (2020-2023)

SUBMITTED TO SUBMITTED BY

1. MS. SHABNAM PARVEEN Vanshita Manchanda


(ASSISTANT PROFESSOR) 35621188820
2. MR. RAGHAV JAIN BATCH
(ASSISTANT PROFESSOR) SEMESTER 4E
Index

Sr. No. Particulars Dates Page No. Signature of


Faculty
PHASE 1

1 Assignment 1

2 Assignment 2

3 Assignment 3

4 Exercise 1

5 Exercise 2

6 Exercise 3

7 Exercise 4

PHASE 2

8 Assignment 4

9 Exercise 5

10 Exercise 6

11 Exercise7

12 Exercise 8

PHASE 3

13 Assignment 5

14 Exercise 9

15 Exercise 10
Phase1
Assignment 1

Give an Introduction to Advanced Excel: Features and Application?


Advanced Excel Functions refers to the features and functions of Microsoft Excel, which helps the user to
perform complex calculations, perform data analysis, and much more.

Advanced Excel is quite different from Basic Excel, the focus for the user is more on DSUM, DCOUNT, Pivot
Table, Pivot Chart, Formulas, Functions, and Macros. Some of the other important concepts to explore while
working on Advanced Excel are: If Statements. Sum Products.

⚫ Advanced Excel users know how to gather, structure & present their data so that it
looks impressive.

⚫ Advanced knowledge of Excel means possessing the ability to use spreadsheets,


graphing, tables, calculations, and automation efficiently to process large quantities of
data relevant to business tasks.

⚫ Advanced Excel skills are all about mastery over formulas and other Excel features for
handling complex tasks. Experts can use Excel for more advanced purposes like data
analytics and simulation.

⚫ Microsoft Excel can be a useful platform to enter and maintain research study data.
Excel is fairly easy to learn and use. Researchers can use Excel's simple statistical and
plotting functions to help gain insight into their data. However, most research projects
require more extensive statistical techniques that can be most easily performed using
additional statistical software packages such as SPSS software.

Topics Covered in Advanced MS Excel:


● Data Validation.
● Logical Function.
● What if Analysis.
● Goal seeks.
● Lookup functions.
● Pivot Tables.
● Array Functions.
● Excel Dashboard.
Features And Applications
1. Fuzzy Matching

Fuzzy matching is a Microsoft Excel productivity feature that allows you to check through related items across
different lists and merge them if they are approximately similar.

For instance, if you are carrying out a survey and some entries have typing errors, a fuzzy match will count
them together with the correctly spelled entries as long as the spelling remains as close as possible to the
correct one.

To make your entries even more accurate, you can set your preferred level of Similarity Threshold to your
fuzzy match. However, you should know that the fuzzy matching feature will only work on text columns.
Here’s how to get your fuzzy match-up:

1. First, make sure you have Fuzzy Lookup installed and enabled on Excel.
2. The next step is to turn your list into a table. You will achieve this by highlighting your list and
pressing Ctrl + T.
3. Once you do this, you will see the Fuzzy Lookup Tool appear on the taskbar.
4. Now, select which of the converted tables you want to compare.
5. You will then see a pop-up asking you to select the Similarity Threshold you want for the comparison
—a threshold of 0.85 isn’t so bad.
6. Lastly, choose a cell where the Fuzzy Lookup Table will be inserted, and click Go on the Lookup
Tool to complete your comparison.
2. New Window and Arrange

If your work requires you to deal with multiple worksheets, then this is a feature you will find helpful.
Microsoft Excel allows you to open various windows and arrange them as you wish for ease of access.

With all the tabs you need open, and in your view, you can save yourself the time and hassle of shuffling
between multiple windows. It will also go a long way in minimizing errors and confusion.

If your work requires you to deal with multiple worksheets, then this is a feature you will find helpful.
Microsoft Excel allows you to open various windows and arrange them as you wish for ease of access.

With all the tabs you need open, and in your view, you can save yourself the time and hassle of shuffling
between multiple windows. It will also go a long way in minimizing errors and confusion.

To use the arrange feature, here are the steps you need to follow:

1. Open your desired workbooks, then click on the worksheets you want to open.
2. Click on the View tab, then Arrange All.
3. Select the option that best suits you on the dialogue box that appears.
4. Click OK.

3. Text to Columns

So you have been making changes to your worksheet when suddenly you have to split the data from one
column into different ones. Say you want to have the first and second names in two separate columns. It seems
hectic, right? Fortunately, you do not have to copy-paste your data cell by cell with the text to column feature.

It allows you to separate the texts in your columns using a delimited width or a delimiter such as a comma,
hyphen, or space. So if the entries in your column have any of these, the feature will enable you to shift part of
its data to a new column.

Here is how to use the text to columns feature:

1. Click on the column that has your intended text.


2. Click on Data, then select Text to Columns.
3.A dialogue box will appear showing the Convert Text to Columns Wizard. Click
on Delimited > Next.

4.Choose the Delimiters that apply to your data. This may be Space or Comma.

5.Once done, click on Next and choose the data Destination on your worksheet.

6.Then click on Finish.


You should note that this feature is also practical for other data, such as dates or serial numbers. The key is to
ensure that your delimiters are all in the correct place before you try to split the data.

4. Import Statistics From Websites

Transferring data from a particular website to your Excel sheet can also be a pain point, but not anymore. With
the Import Stats feature, it’s all so simple. Here's what you need to do to get this feature up and running.

First, you need to open the Excel sheet into which you want to import data.

1. Click on File, and on the drop-down menu, select Open. Then choose Add a place.

Once you do this, a dialogue box appears asking you which files you want to import to Excel.

2. Go down to the question bar, add the URL of the site you want to import data from, and click Open.

It will take a second or two, and then another dialogue box will open asking you to input your
Windows security key. This will happen if the PC you are using has a login password when you start
Windows.

3. So enter your Windows username, the login password, and click Okay.

5. Remove Duplicates Feature

The last thing you want when dealing with data is redundancy. Unfortunately, after transferring data from one
column to another or from a different site, you may run the risk of having things replicated. To avoid this, you
can use the Remove Duplicates feature in Excel.

1. Select the table, then proceed to the Data tab to get this done.

2. Here, click on Remove duplicates.

This will prompt a window to open. The pop-up window will ask which columns and rows you want to
scan for duplicates. Input the necessary columns and rows, and Excel will clear your copies.
6. Custom Lists

Custom lists are an efficient way of avoiding tedious data entry and the risk of errors. Creating a custom list
ahead of time allows you to add a drop-down selection or use Excel’s autofill feature, thus saving you time.
Assignment -2

What do you mean by data and types of data? Explain it with the help of suitable
examples and also explain sources of different types of data?
 Data is a set of values of subjects with respect to qualitative or quantitative variables.
 Data is raw, unorganized facts that need to be processed. Data can be something simple and
seemingly random and useless until it is organized.
 When data is processed, organized, structured or presented in a given context so as to make it
useful, it is called information.
 Information, necessary for research activities are achieved in different forms.
 The main forms of the information available are:
1. Primary data
2. Secondary data
3. Cross-sectional data
4. Categorical data
5. Time series data
6. Spatial data
7. Ordered data

Qualitative data refers to information about qualities, or information that cannot be measured. It’s usually
descriptive and textual. Examples include someone’s eye colour or the type of car they drive. In surveys, it’s
often used to categorise ‘yes’ or ‘no’ answers.

Sources of Qualitative Data

Although qualitative data is much more general than quantitative, there are still a number of common
techniques for gathering it. These include:

 interviews, which may be structured, semi-structured or unstructured;

 Focus groups, which involve multiple participants discussing an issue


 ‘Postcards’, or small-scale written questionnaires that ask, for example, three or four focused questions
of participants but allow them space to write in their own words;

 Secondary data, including diaries, written accounts of past events, and company reports; and

 Observations, which may be on site, or under ‘laboratory conditions’, for example, where participants
are asked to role-play a situation to show what they might do.

Quantitative data
Quantitative data is numerical. It’s used to define information that can be counted. Some examples of
quantitative data include distance, speed, height, length and weight. It’s easy to remember the difference
between qualitative and quantitative data, as one refers to qualities, and the other refers to quantities.
A bookshelf, for example, may have 100 books on its shelves and be 100 centimetres tall. These are
quantitative data points. The colour of the bookshelf – red – is a qualitative data point.

Source of quantitative data

The most common sources of quantitative data include:

 Surveys, whether conducted online, by phone or in person. These rely on the same questions being asked in the
same way to a large number of people;
 Observations, which may either involve counting the number of times that a particular phenomenon occurs, such
as how often a particular word is used in interviews, or coding observational data to translate it into numbers;
and
 Secondary data, such as company accounts.

Explain different scales of measurement?


Measurement of scales
The first decision to be made in operationalizing a construct is to decide on what is the intended level of
measurement. Levels of measurement , also called rating scales , refer to the values that an indicator can take
(but says nothing about the indicator itself). For example, male and female (or M and F, or 1 and 2) are two
levels of the indicator “gender.”
The four scales of measurement
By understanding the scale of the measurement of their data, data scientists can determine the kind of
statistical test to perform.

1. Nominal scale of measurement

The nominal scale of measurement defines the identity property of data. This scale has certain characteristics,
but doesn’t have any form of numerical meaning. The data can be placed into categories but can’t be
multiplied, divided, added or subtracted from one another. It’s also not possible to measure the difference
between data points.

Examples of nominal data include eye colour and country of birth. Nominal data can be broken down again
into three categories:

● Nominal with order: Some nominal data can be sub-categorised in order, such as “cold, warm, hot and
very hot.”
● Nominal without order: Nominal data can also be sub-categorised as nominal without order, such as
male and female.
● Dichotomous: Dichotomous data is defined by having only two categories or levels, such as “yes’ and
‘no’.

2. Ordinal scale of measurement

The ordinal scale defines data that is placed in a specific order. While each value is ranked, there’s no
information that specifies what differentiates the categories from each other. These values can’t be added to or
subtracted from.

An example of this kind of data would include satisfaction data points in a survey, where ‘one = happy, two =
neutral, and three = unhappy.’ Where someone finished in a race also describes ordinal data. While first place,
second place or third place shows what order the runners finished in, it doesn’t specify how far the first-place
finisher was in front of the second-place finisher.

3. Interval scale of measurement

The interval scale contains properties of nominal and ordered data, but the difference between data points can
be quantified. This type of data shows both the order of the variables and the exact differences between the
variables. They can be added to or subtracted from each other, but not multiplied or divided. For example, 40
degrees is not 20 degrees multiplied by two.

This scale is also characterised by the fact that the number zero is an existing variable. In the ordinal scale,
zero means that the data does not exist. In the interval scale, zero has meaning – for example, if you measure
degrees, zero has a temperature.

Data points on the interval scale have the same difference between them. The difference on the scale between
10 and 20 degrees is the same between 20 and 30 degrees. This scale is used to quantify the difference
between variables, whereas the other two scales are used to describe qualitative values only. Other examples of
interval scales include the year a car was made or the months of the year.

4. Ratio scale of measurement

Ratio scales of measurement include properties from all four scales of measurement. The data is nominal and
defined by an identity, can be classified in order, contains intervals and can be broken down into exact value.
Weight, height and distance are all examples of ratio variables. Data in the ratio scale can be added, subtracted,
divided and multiplied.

Ratio scales also differ from interval scales in that the scale has a ‘true zero’. The number zero means that the
data has no value point. An example of this is height or weight, as someone cannot be zero centimetres tall or
weigh zero kilos – or be negative centimetres or negative kilos. Examples of the use of this scale are
calculating shares or sales. Of all types of data on the scales of measurement, data scientists can do the most
with ratio data points.

Rating scale is defined as a closed-ended survey question used to represent respondent feedback in
a comparative form for specific particular features/products/services.
ASSIGNMENT 3
What do you mean by central tendency and importance of mean median mode in
research?

Central tendency is a descriptive summary of a dataset through a single value that reflects the centre of the
data distribution. Along with the variability (dispersion) of a dataset, central tendency is a branch of
descriptive statistics.
The central tendency is one of the most quintessential concepts in statistics. Although it does not provide
information regarding the individual values in the dataset, it delivers a comprehensive summary of the whole
dataset.

 Mean -The mean is the arithmetic average, and it is probably the measure of central tendency that you
are most familiar. Calculating the mean is very simple. You just add up all of the values and divide by
the number of observations in your dataset.

 Median -The median is the middle value. It is the value that splits the dataset in half. To find the
median, order your data from smallest to largest, and then find the data point that has an equal amount
of values above it and below it. The method for locating the median varies slightly depending on
whether your dataset has an even or odd number of values. 

 Mode-The mode is the value that occurs the most frequently in your data set. On a bar chart, the mode
is the highest bar. If the data have multiple values that are tied for occurring the most frequently, you
have a multimodal distribution. If no value repeats, the data do not have a mode.

Explain the measures of variance?

Measures of variation in statistics are ways to describe the distribution or dispersion of your data. In other
words, it shows how far apart data points are from each other. Statisticians use measures of variation to
summarize their data. You can draw many conclusions by using measures of variation, such as high and low
variability. High variability can mean that the data is less consistent while low variability data is more
consistent. You can use measures of variation to measure, analyze or describe trends in your data, which can
apply to many careers that use statistics.

Three measures of variance are


i. Range
ii. Standard deviation
iii. Variance

Range

Range is one of the simplest measures of variation. It is the lowest point of data subtracted from the highest
point of data. For example, if your highest point is 10 and your lowest point is 3, then your range would be 7.
The range tells you a general idea of how widely spread your data is. Because range is so simple and only uses
two pieces of data, consider using it with other measures of variation so you have a variety of ways to measure
and analyze the variability of your data

The range of a data set is the number R defined by the formula

R=xmax−xmin

where xmax is the largest measurement in the data set and xmin is the smallest.

Standard deviation
Standard deviation is the average or standard distance between each point of data and the mean. It is the
standard amount of variability in your data set. If you know the variance of your data set, then you can take the
square root of that value to find the standard deviation. However, you can also calculate the standard deviation
by using equations. This equation is if you have the data for a total population:

σ = √ ∑ (X − µ)2 ÷ N

where:

σ = population standard deviation

∑= sum of

X = each value

µ = population mean

N = number of values in the population

Variance

Variance is the average squared variations of values from the mean. It compares every piece of value to the
mean, which is why variance differs from the other measures of variation. Variance also displays the spread of
the data set. Typically, the more spread out your data is, the larger the variance is. Statisticians use variance to
compare pieces of data to one another to see how they relate. Variance is standard deviation squared, which
denotes that values of variance are larger than the other values. 
The sample variance of a set of n sample data is the number s2 defined by the formula

s2=Σ(x−x−−)2n−1s2=
Σ(x−x-)2n−1

which by algebra is equivalent to the formula

s2=Σx2−1n(Σx)2n−1
EXERCISE – 1
Research methodology

S.No. Questions

1. Collect data related to any topic and from either your family or friends
or classmates (at least 5). Prepare a pivot table using the data collected.
Use data of population in India from different age groups of any year.
2.
Find out mean, median and mode.
3. Using Defining the variable functions, show each step of defining the status
Rejected/accepted by using the variables.

NAME POST

Aman Associate
Shalu senior assistant
Ram Analyst
shankar IT

Amaal Associate

4. Take the data of literacy rate of male and female of different states (any year) and show
what is the average rate of literacy among male and average literacy rate among female,
also compare the data and find male or female has the highest literacy rate.

Solution –
Answer 1.
Answer 2.

Answer 3.
Answer 4.
EXERCISE 2
S.No. Questions

1.

Solution-
Pie chart-A pie chart is a circle that is divided into areas, or slices. Each slice represents the
count or percentage of the observations of a level for the variable. Pie charts are often used in
business. Examples include showing percentages of types of customers, percentage of
revenue from assorted products, and profits from different countries
Line chart-A line chart is a graphical representation of an asset's historical price action that
connects a series of data points with a continuous line. This is the most basic type of chart
used in finance, and it typically only depicts a security's closing price over time. Line charts
can be used for any timeframe, but they most often make use of day-to-day price changes.
Scattered chart-A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two
different numeric variables. The position of each dot on the horizontal and vertical axis indicates
values for an individual data point. Scatter plots are used to observe relationships between variables.

column chart-A column chart is a data visualization where each category is represented by a
rectangle, with the height of the rectangle being proportional to the values being plotted.
Column charts are also known as vertical bar charts.
EXERCISE 3
S.No Questions

Prepare a data set based on height of 10 students- Male, Female. Calculate Mean,
1. median, mode, standard deviation, variance, range and also draw a chart of any
type.

SOLUTION-
height
of
student student
male female
a 168 155
b 184 150
c 150 145
d 145 140
e 160 160
f 175 155
g 175 165
h 166 168
i 156 162
j 145 159

male female

Mean 162.4Mean 155.9


Standard Error 4.261455 Standard Error 2.790659
Median 163Median 157
Mode 145Mode 155
Standard Deviation 13.4759 Standard Deviation 8.824839
Sample Variance 181.6Sample Variance 77.87778
Kurtosis -1.17639 Kurtosis -0.40619
Skewness 0.089789 Skewness -0.53973
Range 39Range 28
Minimum 145Minimum 140
Maximum 184Maximum 168
Sum 1624Sum 1559
Count 10Count 10
Confidence Confidence
Level(95.0%) 9.640081 Level(95.0%) 6.312909
CHART

Chart Title
200
180
160
140
120
100
80
60
40
20
0
a b c d e f g h i j

height of student male height of student female

EXERCISE 4

S.No Questions

Prepare a questionnaire to study the Student perception. Identify the factors behind
1. student’s perception. Prepare a research model for the same. (write constructs, attributes,
dependent and independent variables)
Collect the data from 30 students by preparing Google Form and analyze it by calculating
2.
mean, median and mode.

Answer 1.
SOLUTION-
1. Academics
 COLLEGE GRADING
 QUALIFIED TEACHERS
 STUDENT EXCHANGE PROGRAMS
 SCHOLARSHIPS
2. Environment
 HYGIENE
 CANTEEN
 GREENY
 LEARNING ENVIRONMENT
 DISCIPLINED CROWD
 SAFETY FOR STUDENTS
 EDUCATIONAL TRIPS
 GOOD WORKING CONDITIONS
3. Infrastructure
 LABS
 SMART CLASSES
 AIR CONDITIONED CLASSES
 MEDICAL ROOM
 AUDITORIUM
 PLAYGROUND
 PARKING
 LIFT
 SCANNERS
4. Activities
 CURRICULUM ACTIVITIES
 PLACEMENT
 FEST
 SPORTS
factor
activities
affecting

infrastuctu environme curricuku


academics placement fest sports
re nt m activies

learning
college scholarshi qualified smart safety for
fees labs auditorium parking lift hygiene greeny environme canteen
grading p teachers classes student
nt

Preparing questionnaire
Part A
Name - gender- contact no-
Contact no - email id-
Course year-

Part B
1.Highly 2.disagree 3.neutral 4.agree 5.strongly
disagree agree

Constructs 1 2 3 4 5
ACADEMICS
1. Are you satisfied with your fees of 
your college?
2. How will you rate your scholarship program? 
3. Are you satisfied with your teacher? 
4. Are you satisfied with student exchange 
program?
INFRASTRUCTURE
1. Are you satisfy with your classroom? 
2. How will you rate the electricity facilities? 
3. Are you satisfied with smart classes? 
4. Are you satisfy with your canteen? 
ACTIVITIES
1. Are you happy with the industrial visits? 
2. Are you happy with webinars conducted in 
college?
3. Are you satisfied with sports facilities? 
4. Are you satisfied with other various events? 
ENVIRONMENT
1. Are you satisfied about the safety of students? 
2. Are you satisfy with hygiene factor of 
your college?
3. Are you satisfy with the eco friendly 
environment?
4. Are you satisfy with learning environment? 

Answer 2
  Factors Affecting
s.no Academics Infrastructure Environment Activities
1 45 56 39 52
2 78 48 41 32
3 90 21 36 94
4 56 66 95 75
5 34 96 75 57
6 23 87 14 77.9
7 56 35 85 83.2
8 89 41 33 88.5
9 77 21 22 93.8
10 75 20 85 99.1
11 88 88 47 10
12 44 99 89 50
13 32 74 41 48
14 99 62 95 74
15 25 35 33 22
16 76 33 66 45
17 64 47 45 66
18 36 36 12 45
19 37 95 33 32
20 11 88 2 84
21 90 45 5 72
22 98 75 8 60
23 56 95 99 9
24 65 35 45 39
25 37 53 66 47
26 41 51 28 46
27 31 27 84 25
28 83 11 86 12
29 39 9 76 32
30 79 47 51 47
Mean 58.5 53.2 51.2 53.917
Median 56 47.5 45 49
Mode 56 35 33 32
Phase 2

ASSIGNMENT 4

What do you understand by “t test”? What are the types of t test. Explain one tailed and
two tailed concepts in t test hypothesis testing.

The t test tells you how significant the differences between group means are. It lets you know if those
differences in means could have happened by chance. The t test is usually used when data sets follow a normal
distribution but you don’t know the population variance.
For example, you might flip a coin 1,000 times and find the number of heads follows a normal distribution for
all trials. So you can calculate the sample variance from this data, but the population variance is unknown. Or,
a drug company may want to test a new cancer drug to find out if it improves life expectancy. In an
experiment, there’s always a control group (a group who are given a placebo, or “sugar pill”). So while the
control group may show an average life expectancy of +5 years, the group taking the new drug might have a
life expectancy of +6 years. It would seem that the drug might work. But it could be due to a fluke. To test
this, researchers would use a Student’s t-test to find out if the results are repeatable for an entire population

 one-tailed test is a statistical test in which the critical area of a distribution is one-sided so that it is either
greater than or less than a certain value, but not both. If the sample being tested falls into the one-sided
critical area, the alternative hypothesis will be accepted instead of the null hypothesis.

A two-tailed test, in statistics, is a method in which the critical area of a distribution is two-sided and tests
whether a sample is greater than or less than a certain range of values. It is used in null-hypothesis testing and
testing for statistical significance. If the sample being tested falls into either of the critical areas, the
alternative hypothesis is accepted instead of the null hypothesis.

Explain the interpretations of t tests in different types of testing with suitable examples.

There are three types of t-tests we can perform based on the data at hand:

 One sample t-test

The one-sample t-test is a statistical hypothesis test used to determine whether an unknown population mean is
different from a specific value. You can use the test for continuous data. Your data should be a random sample
from a normal population. If your sample sizes are very small, you might not be able to test for normality. You
might need to rely on your understanding of the data. When you cannot safely assume normality, you can
perform a nonparametric test that doesn’t assume normality.

For the one-sample t-test, we need one variable.

We also have an idea, or hypothesis, that the mean of the population has some value. Here are two examples:
 A hospital has a random sample of cholesterol measurements for men. These patients were seen for
issues other than cholesterol. They were not taking any medications for high cholesterol. The
hospital wants to know if the unknown mean cholesterol for patients is different from a goal level
of 200 mg.
 We measure the grams of protein for a sample of energy bars. The label claims that the bars have
20 grams of protein. We want to know if the labels are correct or not

 Independent two-sample t-test

The two-sample t-test (also known as the independent samples t-test) is a method used to test whether
the unknown population means of two groups are equal or not. You can use the test when your data
values are independent, are randomly sampled from two normal populations and the two independent
groups have equal variances.If your sample sizes are very small, you might not be able to test for
normality. You might need to rely on your understanding of the data. When you cannot safely assume
normality, you can perform a nonparametric test that doesn’t assume normality.

Here are a couple of examples:

 We have students who speak English as their first language and students who do not. All students
take a reading test. Our two groups are the native English speakers and the non-native speakers.
Our measurements are the test scores. Our idea is that the mean test scores for the underlying
populations of native and non-native English speakers are not the same. We want to know if the
mean score for the population of native English speakers is different from the people who learned
English as a second language.
 We measure the grams of protein in two different brands of energy bars. Our two groups are the
two brands. Our measurement is the grams of protein for each energy bar. Our idea is that the mean
grams of protein for the underlying populations for the two brands may be different. We want to
know if we have evidence that the mean grams of protein for the two brands of energy bars is
different or not.

 Paired sample t-test

The paired t-test is a method used to test whether the mean difference between pairs of measurements
is zero or not. You can use the test when your data values are paired measurements. For example, you
might have before-and-after measurements for a group of people. Also, the distribution of differences
between the paired measurements should be normally distributed. If your sample sizes are very small,
you might not be able to test for normality. You might need to rely on your understanding of the data.
Or, you can perform a nonparametric test that doesn’t assume normality.

 Here are three examples:

 A group of people with dry skin use a medicated lotion on one arm and a non-medicated lotion on
their other arm. After a week, a doctor measures the redness on each arm. We want to know if the
medicated lotion is better than the non-medicated lotion. We do this by finding out if the arm with
medicated lotion has less redness than the other arm. Since we have pairs of measurements for each
person, we find the differences. Then we test if the mean difference is zero or not.
 We measure weights of people in a program to quit smoking. For each person, we have the weight
at the start and end of the program. We want to know if the mean weight change for people in the
program is zero or not.
 An instructor gives students an exam and the next day gives students a different exam on the same
material. The instructor wants to know if the two exams are equally difficult. We calculate the
difference in exam scores for each student. We test if the mean difference is zero or not. 
Exercise 5
S.No. Questions

Prepare a data set of 30 people who are going to GYM to reduce their weight.
1. GYM instructor wanted to know the effectiveness of GYM training module.
Apply T Test for the same, Prepare Hypothesis, Analyze and interpret the result.
Prepare a data set of 30 students
2 Check, Whether the average weight of Boys and Girls are same or
not. Prepare Hypothesis, Analyze and Interpret the results
Two types of drugs were used on 5 and 7 patients for reducing weight after
using the drugs for six months was as follows:
3 Drug A 10 12 13 11 14
Drug B 8 9 12 14 15 10 9

Is there a significant difference in the efficiency of the two


drugs? Test it at 5% level of significance

Answer 1.

students weights dummy


1 52 0
2 35 0
3 86
4 84
5 42 Step 1- see if there is any difference
6 56
7 53 between average of both the variables
8 57
Step 2- Ho is null hypothesis; H1/Hµ is
9 80
10 90 alternative hypothesis
11 98
12 89
Step 3- level of significance is 0.05.
13 87 Step 4- if average of variable 1 are not
14 86
15 84 equal to variable 2 the situation is 2 tailed
16 53 Steps 5-we accept the null hypothesis test
17 62
18 67 because the value of P is more than level
19 69
of significance which is 0.05.
20 61
21 53
22 59
23 75
24 77
25 70
26 79
27 89
28 81
29 82
30 60
Answer 2
ques 2
students boy girl
1 56 50
2 25 60
3 36 70
4 78 50
5 59 54
6 45 65
7 56 55
8 55 52
9 56 49
10 63 48
11 63 47
12 64 46
13 57 43
14 58 45
15 85 54
16 95 59
17 59 51
18 54 50
19 55 53
20 66 52
21 44 39
22 72 40
23 83 41
24 61 65
25 44 60
26 41 61
27 51 62
28 50 63
29 60 64
30 24 65
Step 1- see if there is any difference between average of both the variables
Step 2- Ho is null hypothesis; H1/Hµ is alternative hypothesis
Step 3- level of significance is 0.05.
Step 4- if average of variable 1 are not equal to variable 2 the situation is 2 tailed
Steps 5- we accept the null hypothesis test because the value of P is more than level of
significance which is 0.05.

Answer 3-

drug a drug b
10 8
12 9
13 12
11 13
14 14
15
10
9

Step 1- see if there is any difference between average of both the variables
Step 2- Ho is null hypothesis; H1/Hµ is alternative hypothesis
Step 3- level of significance is 0.05.
Step 4- if average of variable 1 are not equal to variable 2 the situation is 2 tailed

step 5- we accept the null hypothesis test because the value of P is more than level of
significance which is 0.05.
Exercise 6

Relevant
S.
Questions Course
No.
Outcomes
The following data set is related to the height of boys and girls where the
samples have been drawn independently. Test for their mean difference.
(Independent t test)
CO3
Boys 150 145 146 135 151 160 148 149 155 152
1. &
CO4
Girls 140 132 139 141 145 151 146 147 140 144

Two druga A and B are tried on some patients as pain relievers. Check
whether the drugs are significant in their effect on relieving pain.You may
use 5% level of significance. Assume that the variance of 2 populations is
not the same. CO3
2 &
CO4
Drug A 12 10 14 16 9 17 15 13
Drug B 14 12 9 8 14 13 14
In certain experiment to compare the temperature 2 times- morning and
evening, the following results of change in temperature, we observed 10
times- CO3
3 Morning 21, 20, 23, 24, 19, 20, 17, 22, 23, 25 &
Evening 20, 19, 22, 20, 19, 20, 18, 21, 20, 21 CO4
Examine the significance of change in temperature in morning and
evening.(paired t test)
Take sample of 20 employees at random from a large population and their
CO3
measure their performance before and after training-
4 &
Check whether the training differ significantly? You may use 5% level of
CO4
significance.

Test to determine (at 5% level oof significance) whether the population


mean age is significantly greater than 20 given the following ages:(one CO3
5 sample t test) &
Age 36, 28, 21, 25, 31, 17, 22, 18, 18, 29, 21, 26, 17, 18, 30, 19, 19, 28 CO4
Answer1

BOYS GIRLS
150 140
145 132
146 139
135 141
151 145
160 151
148 146
149 147
155 140
152 144

t-Test: Two-Sample Assuming Unequal Variances

Variable Variable
  1 2
Mean 149.1 142.5
Variance 43.65556 27.83333
Observations 10 10
Hypothesized Mean Difference 0  
Df 17  
t Stat 2.468452  
P(T<=t) one-tail 0.012238  
t Critical one-tail 1.739607  
P(T<=t) two-tail 0.024476  
t Critical two-tail 2.109816  

Answer 2

drug A drug B
12 14
10 12
14 9
16 8
9 14
17 13
15 14
13  
   
Step 1 is there any difference between efficacy of both medicines?
Step 2 Null Hypothesis, H0 = AVG(Med A) = AVG (Med B)
Alternate hypothesesis, H1 = AVG(Med A) is not equal to AVG(Med
B)
Step 3 Probably the two tail, Med A > Med B or Med A < Med B
Step 4 LOS assumed to be 5%
Step 5 Solution refer adjacent table
Step 6 Hence, the PV < LOS then we will reject the null hypothesis.
Step 7 Comment
t-Test: Two-Sample Assuming Unequal Variances    
     
Variable Variable
  1 2
Mean 13.25 12
Variance 7.928571 6.333333
Observations 8 7
Hypothesized Mean Difference 0  
df 13  
t Stat 0.907841  
P(T<=t) one-tail 0.190242  
t Critical one-tail 1.770933  
P(T<=t) two-tail 0.380484  
t Critical two-tail 2.160369  

Answer3
Paired sample T-Test

Temperature
Place Morning Evening
1 21 20
2 20 19
3 23 22
4 24 20
5 19 19
6 20 20
7 17 18
8 22 21
9 23 20
10 25 21

Steps
 Set the objectives to see if there is any difference between average of
both the variables
 Ho is null hypothesis; H1/Hµ is alternative hypothesis
 level of significance is 0.05.
 we accept the null hypothesis because the value of P is more than level of
significance which is 0.05.
t-Test: Paired Two Sample for Means

  Morning Evening
Mean 21.4 20
Variance 6.044444444 1.333333333
Observations 10 10
Pearson Correlation 0.782780364
Hypothesized Mean Difference 0
df 9
t Stat 2.584921311
P(T<=t) one-tail 0.014728927
t Critical one-tail 1.833112923
P(T<=t) two-tail 0.029457854
t Critical two-tail 2.262157158  

Answer4

Student Before Training (X) After training (Y)


1 18 22
2 21 25
3 16 17
4 22 24
5 19 16
6 27 29
7 17 20
8 21 23
9 23 19
10 18 20
11 14 15
12 16 15
13 16 18
14 19 26
15 18 18
16 20 24
17 12 18
18 22 15
19 15 19
20 17 16

Steps :-
Step 1 is there any difference between trainees before and after training?
Step 2 Null Hypothesis, H0 = AVG(X) = AVG (Y)
Alternate hypothesises, H1 = AVG(X) is not equal to AVG (Y)
Step 3 It is two tail hypothesis, Med A > Med B or Med A < Med B
Step 4 LOS assumed to be 5%
Step 5 Solution refer below table
Step 6 Hence, the PV > LOS then we will Accept the null hypothesis.
Step 7 Hence, there is no significant difference between the two variable X and Y
t-Test: Paired Two Sample for Means    
   
Before Training After training
  (X) (Y)
Mean 18.55 19.95
Variance 12.15526316 16.68157895
Observations 20 20
Pearson Correlation 0.611892068  
Hypothesized Mean Difference 0  
Df 19  
t Stat -1.853489777  
P(T<=t) one-tail 0.03970319  
t Critical one-tail 1.729132792  
P(T<=t) two-tail 0.079406379  
t Critical two-tail 2.09302405  
Answer 5

Age dummy
36 0
28 0
21
25
31
17
22
18
18
29
21
26
17
18
30
19
28

Steps :-
Step 1 To set objective to determine whether the population mean age is significantly greater than 20
Step 2 Null Hypothesis, H0 < = 20
Alternate hypothesises, H1 >20
Step 3 It is one tail hypothesis
Step 4 LOS assumed to be 5%
Step 5 Solution refer below table
Step 6 Hence, the PV <LOS then we will reject the null hypothesis.
t-Test: One-Sample

  Variable 1  
Mean 23.76470588
Variance 33.94117647
Observations 17
Hypothesized Mean Difference 0
df 16
t Stat 16.81874006
P(T<=t) one-tail 0.00
t Critical one-tail 1.745883669
P(T<=t) two-tail 0.00
t Critical two-tail 2.119905285  
Exercise 7

S.No. Questions

Compare the cholesterol contents of the four competing diet foods on the basis of
the following data (in milligrams per packages) which were obtained for three
randomly taken 6 ounce package of each of the diet food. The measurement of the
Cholesterol content was performed in three different laboratories:

1.

Test whether the difference among the sample means can be attributed to chance
at the 5% level of significance.
The following table presents the number of the defective pieces produced by three
workmen operating in turn on three different machines:

2
Conduct 2 Way ANOVA to test at 5% level of significance, whether:
1. The difference among the means obtained for the three workmen can
be attributed to chance
2. The differences among the means obtained for the three machines can be
attributed to chance.
The following data refers to the salary packages (in Lakhs) offered to MBA
graduates with three specializations and having studied at 4 different business
schools. For the sake of simplification, only two students are taken for each
interaction between the institute and field of specialization

Test the hypothesis:


1. Whether the difference between the pay packages offered by
different business schools can be attributed to chance
2. Average pay packages by all specializations are equal
3. The average pay packages for 12 interactions are equal
Answer 1

Laboratory
Diet Food
One Two Three
Diet food A 3.6 4.1 4
Diet Food B 3.1 3.2 3.9
Diet Food C 3.2 3.5 3.5
Diet Food D 3.5 3.8 3.8
Step – I : To set the objective to check the significance difference between diet plan of laboratory

Step – II : Ho: u(NB)=U(LB)= U(FB)

Step – III : H1:U(u(NB)NOT EQUAL U(FB)

Step – IV : LEVEL OF SIGNIFICANCE IS 0.05

Anova: Two-Factor
Without Replication          
             
SUMMA Varia
RY Count Sum Average nce    
Diet
Food B 3 10.2 3.4 0.19    
Diet
Food C 3 10.2 3.4 0.03    
Diet
Food D 3 11.1 3.7 0.03    
             
3.26666 0.043
3.6 3 9.8 6667 333    
4.1 3 10.5 3.5 0.09    
3.73333 0.043
4 3 11.2 3333 333    
             
             
ANOVA            
Source
of
Variatio P-
n SS df MS F value F crit
2.076 0.240 6.944
Rows 0.18 2 0.09 923 655 272
0.32666 0.16333 3.769 0.120 6.944
Columns 6667 2 3333 231 178 272
0.17333 0.04333
Error 3333 4 3333      
             
Total 0.68 8        

Answer.2
  Machine 1 Machine 2 Machine 3
Workman 1 27 34 23
Workman 2 29 32 25
Workman 3 22 30 22

Step – I : To set the objective to check the significance difference between in


machine with ref to work man
Step – II : set null hypotheses and alternative hypothises
Step – III : h1: µ( µ(1) ≠µ(2) ≠µ(3) ≠µ(4)
Step – IV : ho: µ(M) = µ(F)=µ(O)
Step – V : ho2: µ(1) = µ(2)=µ(3)=µ(4)
Step – VI : Level of significance is 0.05
Anova: Single Factor            
             
SUMMARY            
Varianc
Groups Count Sum Average e    
Workman 1 3 84 28 31    
28.6666 12.3333
Workman 2 3 86 7 3    
24.6666 21.3333
Workman 3 3 74 7 3    
             
             
ANOVA            
Source of Variation SS df MS F P-value F crit
13.7777 0.63917 0.56021 5.14325
Between Groups 27.55555556 2 8 5 5 3
21.5555
Within Groups 129.3333333 6 6      
             
Anova: Single
Total 156.8888889 8        
Factor            
             
SUMMARY            
Groups Count Sum Average Variance    
Machine 1 3 78 26 13    
Machine 2 3 96 32 4    
Machine 3 3 70 23.33333 2.333333    
             
             
ANOVA            
Source of
Variation SS df MS F P-value F crit
Between Groups 118.2222 2 59.11111 9.172414 0.01497 5.143253
Within Groups 38.66667 6 6.444444      
             
Total 156.8889 8        
Specialization I II III IV
6 4 8 6
Marketing
5 5 6 4
7 6 6 9
Finance
6 7 7 8
8 5 10 9
Operations
7 5 9 10

Answer 3 :

Step – I : To set the objective to check the significance difference between


specialization
Step – II : set null hypotheses and alternative hypothises
Step – III : h1: µ( µ(1) ≠µ(2) ≠µ(3) ≠µ(4)
Step – IV : ho: µ(M) = µ(F)=µ(O)
Step – V : ho2: µ(1) = µ(2)=µ(3)=µ(4)
Step – VI : Level of significance is 0.05
             
Anova: Two-
Factor With
Replication            
             
SUMMARY I II III IV Total  
Marketing            
Count 3 3 3 3 12  
Sum 18 15 20 19 72  
6.66666 6.33333
Average 6 5 7 3 6  
1.33333 6.33333 2.18181
Variance 1 1 3 3 8  
             
             
Count 3 3 3 3 12  
Sum 21 17 26 27 91  
5.66666666 8.66666 7.58333
Average 7 7 7 9 3  
1.33333333 2.33333 2.99242
Variance 1 3 3 1 4  
             
Total            
Count 6 6 6 6    
Sum 39 32 46 46    
5.33333333 7.66666 7.66666
Average 6.5 3 7 7    
1.06666666 2.66666 5.06666
Variance 1.1 7 7 7    
             
             
ANOVA            
Source of
Variation SS df MS F P-value F crit
15.0416 7.84782 0.01280 4.49399
Sample 15.04166667 1 7 6 4 8
7.48611 3.90579 0.02867 3.23887
Columns 22.45833333 3 1 7 5 2
1.26388 3.23887
Interaction 3.791666667 3 9 0.65942 0.58887 2
1.91666
Within 30.66666667 16 7      
             
Total 71.95833333 23        
Exersice:8

S.No. Questions

1. Compare the cholesterol contents of the four competing diet foods on the basis of the
following data (in milligrams per packages) which were obtained for three randomly taken
6 ounce package of each of the diet food:
Diet food A 3.6 4.1 4.0
Diet Food B 3.1 3.2 3.9
Diet Food C 3.2 3.5 3.5
Diet Food D 3 .5 3.8 3.8
Test whether the difference among the sample means can be attributed to chance at the
5% level of significance.
The following are the no of words per minute which a secretary types on several occasions
on three different typewriters:
TypeWriter1 71 78 70 69 77 72 65 69
2 TypeWriter2 74 76 72 70 69 68 72 73
TypeWriter3 70 72 66 64 63 67 69 70
Test whether the differences among the mean of the three samples can be attributed to
chance at 5% level of significance.
The following set of data is obtained for the sales of a product corresponding to three price
levels: Rs.39/-, Rs.44/- and Rs.49/-. The data pertains to five randomly selected retail
stores where the product was sold.
Price Levels Sales (in Rs. Lakhs)
Rs.39/- 8 12 10 11
3
Rs.44/- 7 10 6 8 9
Rs.49/- 4 8 7 9 7
Test whether the difference in sales corresponding to various price levels can be attributed
to chance at 5% level of significance. In case of significant difference, carry out further
analysis.

Answer 1:
Laboratory
Diet Food

Diet food A 3.6 4.1 4


Diet Food B 3.1 3.2 3.9
Diet Food C 3.2 3.5 3.5
Diet Food D 3.5 3.8 3.8
Step – I : To set the objective to check the significance difference between diet plan of laboratory

Step – II : Ho: u(NB)=U(LB)= U(FB

Step – III : H1:U(u(NB)NOT EQUAL U(FB)

Step – IV : LEVEL OF SIGNIFICANCE IS 0.05

Anova: Two-Factor
Without Replication          
             
SUMMA Varia
RY Count Sum Average nce    
Diet
Food B 3 10.2 3.4 0.19    
Diet
Food C 3 10.2 3.4 0.03    
Diet
Food D 3 11.1 3.7 0.03    
             
3.26666 0.043
3.6 3 9.8 6667 333    
4.1 3 10.5 3.5 0.09    
3.73333 0.043
4 3 11.2 3333 333    
             
             
ANOVA            
Source
of
Variatio P-
n SS df MS F value F crit
2.076 0.240 6.944
Rows 0.18 2 0.09 923 655 272
0.32666 0.16333 3.769 0.120 6.944
Columns 6667 2 3333 231 178 272
0.17333 0.04333
Error 3333 4 3333      
             
Total 0.68 8        
Answer 2:
TypeWriter1 71 78 70 69 77 72 65 69
TypeWriter2 74 76 72 70 69 68 72 73
TypeWriter3 70 72 66 64 63 67 69 70

Step – I : To set the objective to check the significance difference between


typewriter
Step – II : set null hypotheses and alternative hypotheses
Step – III : h1: µ( µ(1) ≠µ(2) ≠µ(3) ≠µ(4)
Step – IV : ho: µ(M) = µ(F)=µ(O)
Step – V : ho2: µ(1) = µ(2)=µ(3)=µ(4)
Step – VI : Level of significance is 0.05

Anova: Single Factor            


             
SUMMARY            
Groups Count Sum Average Variance    
TypeWriter1 8 571 71.375 18.55357    
TypeWriter2 8 574 71.75 7.071429    
TypeWriter3 8 541 67.625 9.982143    
             
             
ANOVA            
Source of Variation SS df MS F P-value F crit
Between Groups 83.25 2 41.625 3.507021 0.048513 3.4668
Within Groups 249.25 21 11.86905      
             
Total 332.5 23        
Answer 3:
Price Levels Sales (in Rs. Lakhs)
Rs.39/- 8 12 10 9 11
Rs.44/- 7 10 6 8 9
Rs.49/- 4 8 7 9 7

Step – I : To set the objective to check the significance difference between


typewriter
Step – II : set null hypotheses and alternative hypotheses
Step – III : h1: µ( µ(1) ≠µ(2) ≠µ(3) ≠µ(4)
Step – IV : ho: µ(M) = µ(F)=µ(O)
Step – V : ho2: µ(1) = µ(2)=µ(3)=µ(4)
Step – VI : Level of significance is 0.05

Anova: Two-Factor Without


Replication          
             
SUMMARY Count Sum Average Variance    
Rs.39/- 5 50 10 2.5    
Rs.44/- 5 40 8 2.5    
Rs.49/- 5 35 7 3.5    
             
Sales (in Rs. Lakhs) 3 19 6.333333 4.333333    
  3 30 10 4    
  3 23 7.666667 4.333333    
  3 26 8.666667 0.333333    
  3 27 9 4    
             
             
ANOVA            
Source of Variation SS df MS F P-value F crit
Rows 23.33333 2 11.66667 8.75 0.009687 4.45897
Columns 23.33333 4 5.833333 4.375 0.03628 3.837853
Error 10.66667 8 1.333333      
             
Total 57.33333 14        
Phase :3
Assignment :5

S.No. Questions

What do you understand by Non Parametric Tests? How they are differ from parametric tests?
1.

Do we have non parametric tests in correspondence to parametric tests? If yes, then explain.
2

What do you meant by Chi Square Test? What are its advantages and limitations? Is there any
3
assumption for this test?

Non-parametric tests are the mathematical methods used in statistical hypothesis testing, which do not make
assumptions about the frequency distribution of variables that are to be evaluated. The non-parametric
experiment is used when there are skewed data, and it comprises techniques that do not depend on data
pertaining to any particular distribution.
The word non-parametric does not mean that these models do not have any parameters. The fact is, the
characteristics and number of parameters are pretty flexible and not predefined. Therefore, these models are
called distribution-free models.
The key difference between parametric and nonparametric test is that the parametric test relies on statistical
distributions in data whereas nonparametric do not depend on any distribution. Non-parametric does not make
any assumptions and measures the central tendency with the median value. 
The key differences between nonparametric and parametric tests are listed below based on certain parameters
or properties.

Properties Parametric Non-parametric

Assumptions Yes No

central tendency Value Mean value Median value

Correlation Pearson Spearman

Probabilistic distribution Normal Arbitrary

Population knowledge Requires Does not require

Used for Interval data Nominal data

Applicability Variables Attributes & Variables

Examples z-test, t-test, etc. Kruskal-Wallis, Mann-Whitney


S.No. Questions

A manager of ABC ice-cream parlor has to take a decision regarding how much of each flavor of ice-cream
he should stock so that the demands of the customers are satisfied. The ice cream supplier claims that
among the four most popular flavours, 63% customers prefer vanilla, 18% chocolate, 12% strawberry and
8% mango. A random sample pf 200 customers produces the results below. At the alpha = 0.05 significance
level, test the claim that the percentage given by the supplies are correct.
1.
Flavour Vanilla Chocolate Strawberry Mango
No Preferring 120 40 18 22

An insurance company provides auto insurance and is analyzing the data obtained from fatal crashes. A
sample of the motor vehicle deaths is randomly selected for a two year period. The number of fatalities is
listed below for the different days of the week. At the 0.05 LOS, test the claim that accidents occur on
different days with equal frequency.
Days Monday Thursday Friday Sunday
2. Tuesd Wednes Saturd
ay day ay

No of fatalities 31 20 20 22 22 29 36

Answer -1
Flavour Number Preferring probability expected
vanilla 120 0.62 124
chocolate 40 0.18 36
strawberry 18 0.12 24
mango 22 0.08 16
200

steps
1. null hypo is that all probabilities are equal
p(V)= 0.62
p(c)=0.18
p(s)=0.12
p(m)=0.08

2. alternative hypo is that all the probabilities are not equal or any one of them is ot equal
p(v) ≠ 0.62
p(s)≠
0.18
p(c)
≠0.12
p(m)0.08
chi square value        
p value 0.228586623      
         
compare p value with LOS=0.05        
         
as the p value is greater than level of significance so we will accept the null hypothesis

Answer-2

Day number of fatalities expected


Monday 31 25.71429
Tuesday 20 25.71429
Wednesday 20 25.71429
Thursday 22 25.71429
Friday 22 25.71429
Saturday 29 25.71429
Sunday 36 25.71429
180

chi test
0.160874559
 

   
  steps
1 null hypothesis
  there is no relation between accident occurred on different days with equal frequency
2 alternative hypothesis
  there is relation between accident occurred on different days with equal frequency
3 the level of significance is 0.05
  calculate chi square value 0.160
   
hence the calculated p value is less than los so we will accept the null hypothesis and reject the alternative
  hypothesis
   
  which say there is no relation between accident occurred on different days with equal frequency
S.No. Questions

The following table gives the number of good and defective parts produced by each of the
three shifts in a factory:
shift
Good Defective Total
Day 900 130 1030
1. Evening 700 170 870
Night 400 200 600
Total 2000 500 2500
Is there any association between the shift and the equality of the parts produced? Use a 0.05
level of significance.

shift good defective total


day 900 130 1030
evening 700 170 870
night 400 200 600
total 2000 500 2500

Expected

shift good defective total


day 824 206 1030
evening 696 174 870
night 480 120 600
total 2000 500 2500

observatio expecte
n d
900 824
700 696
400 480
130 206
170 174
200 120

chi test
0.00
   
  steps
1 null hypothesis
  there is no relation between shift and production
2 alternative hypothesis
  there is relation between shift and production
3 the level of significance is 0.05
  calculate chi square value 0.000
   
hence the calculated p value is less than level of significance so we will reject the null hypothesis and
  accept the alternative hypothesis
   
  which say there is relation between shift and production

You might also like