You are on page 1of 31

Fairfield Institute of Management and

Technology

Research Methodology Lab


208

NAAC ACCREDITED

Submitted to: Submitted by:

Dr. Suman Yadav Ajay Dhoundiyal


Assistant professor 41951401718
BBA(G)
IV-C
Table Of Content

Sl No. Lab Exercise Topic Page No. Signature

1. Introduction to data analysis

2. Creating Charts in MS Excel, Bar Chart, Pie Chart

3. Making a Histogram on Excel

4. T-test- Introduction &T-test on excel

5. Z-test on excel
Lab assignment 1

Introduction to data analysis


Data analysis is defined as a process of cleaning, transforming, and modelling data to discover
useful information for business decision-making. The purpose of Data Analysis is to extract
useful information from data and taking the decision based upon the data analysis.

Whenever we take any decision in our day-to-day life is by thinking about what happened last
time or what will happen by choosing that particular decision. This is nothing but analysing
our past or future and making decisions based on it. For that, we gather memories of our past
or dreams of our future. So that is nothing but data analysis. Now same thing analyst does for
business purposes, is called Data Analysis.

Types of Data Analysis: Techniques and Methods

There are several types of data analysis techniques that exist based on business and
technology. The major types of data analysis are:

• Text Analysis
• Statistical Analysis
• Diagnostic Analysis
• Predictive Analysis
• Prescriptive Analysis

Text Analysis
Text Analysis is also referred to as Data Mining. It is a method to discover a pattern in large
data sets using databases or data mining tools. It used to transform raw data into business
information. Business Intelligence tools are present in the market which is used to take strategic
business decisions. Overall, it offers a way to extract and examine data and deriving patterns
and finally interpretation of the data.

Statistical Analysis

Statistical Analysis shows "What happen?" by using past data in the form of dashboards.
Statistical Analysis includes collection, Analysis, interpretation, presentation, and modelling
of data. It analyses a set of data or a sample of data. There are two categories of this type of
Analysis - Descriptive Analysis and Inferential Analysis.

Descriptive Analysis

analyses complete data or a sample of summarized numerical data. It shows mean and deviation
for continuous data whereas percentage and frequency for categorical data.
Inferential Analysis

analyses sample from complete data. In this type of Analysis, you can find different
conclusions from the same data by selecting different samples.

Diagnostic Analysis

Diagnostic Analysis shows "Why did it happen?" by finding the cause from the insight found
in Statistical Analysis. This Analysis is useful to identify behaviour patterns of data. If a new
problem arrives in your business process, then you can look into this Analysis to find similar
patterns of that problem. And it may have chances to use similar prescriptions for the new
problems.

Predictive Analysis
Predictive Analysis shows "what is likely to happen" by using previous data. The simplest
example is like if last year I bought two dresses based on my savings and if this year my salary
is increasing double then I can buy four dresses. But of course, it's not easy like this because
you have to think about other circumstances like chances of prices of clothes is increased this
year or maybe instead of dresses you want to buy a new bike, or you need to buy a house!

So here, this Analysis makes predictions about future outcomes based on current or past data.
Forecasting is just an estimate. Its accuracy is based on how much detailed information you
have and how much you dig in it.

Prescriptive Analysis
Prescriptive Analysis combines the insight from all previous Analysis to determine which
action to take in a current problem or decision. Most data-driven companies are utilizing
Prescriptive Analysis because predictive and descriptive Analysis are not enough to improve
data performance. Based on current situations and problems, they analyse the data and make
decisions.

Data Analysis Process

Data Analysis Process is nothing but gathering information by using proper application or tool
which allows you to explore the data and find a pattern in it. Based on that, you can take
decisions, or you can get ultimate conclusions.

Data Analysis consists of the following phases:

• Data Requirement Gathering


• Data Collection
• Data Cleaning
• Data Analysis
• Data Interpretation
• Data Visualization
Data Requirement Gathering

First of all, you have to think about why do you want to do this data analysis? All you need to
find out the purpose or aim of doing the Analysis. we have to decide which type of data analysis
you wanted to do! In this phase, you have to decide what to analyse and how to measure it, you
have to understand why you are investigating and what measures you have to use to do this
Analysis.

Data Collection

After requirement gathering, you will get a clear idea about what things you have to measure
and what should be your findings. Now it's time to collect your data based on requirements.
Once you collect your data, remember that the collected data must be processed or organized
for Analysis. As you collected data from various sources, you must have to keep a log with a
collection date and source of the data.

Data Cleaning

Now whatever data is collected may not be useful or irrelevant to your aim of Analysis, hence
it should be cleaned. The data which is collected may contain duplicate records, white spaces
or errors. The data should be cleaned and error free. This phase must be done before Analysis
because based on data cleaning, your output of Analysis will be closer to your expected
outcome.

Data Analysis

Once the data is collected, cleaned, and processed, it is ready for Analysis. As you manipulate
data, you may find you have the exact information you need, or you might need to collect more
data. During this phase, you can use data analysis tools and software which will help you to
understand, interpret, and derive conclusions based on the requirements.

Data Interpretation
After analysing your data, it's finally time to interpret your results. You can choose the way to
express or communicate your data analysis either you can use simply in words or maybe a table
or chart. Then use the results of your data analysis process to decide your best course of action.

Data Visualization

Data visualization is very common in your day to day life; they often appear in the form of
charts and graphs. In other words, data shown graphically so that it will be easier for the human
brain to understand and process it. Data visualization often used to discover unknown facts and
trends. By observing relationships and comparing datasets, you can find a way to find out
meaningful information.

Visualizing Data with Charts


Lab assignment 2
Creating Charts in M.S. Excel: Graphic Displays for Qualitative Data

Introduction: Graphically representing data is one of the most helpful ways to become
acquainted with the sample data. In this lab you will use Excel to present data graphically. You
will be analysing data using types of graphs: Circle graphs, Bar graphs

I. BAR CHART
The Bar Chart is like a Column Chart lying on its side. The horizontal axis of a Bar Chart contains the
numeric values. The first chart below is the Bar Chart for our single series, Flowers. When to use a Bar
Chart versus a Column Chart depends on the type of data and user preference. Sometimes it is worth
the time to create both charts and compare the results. However, Bar Charts do tend to display and
compare a large number of series better than the other chart types. A bar chart or bar graph is a chart
with rectangular bars with lengths proportional to the values that they represent. The bars can be plotted
vertically or horizontally. A vertical bar chart is sometimes called a column bar chart. Bar charts provide
a visual presentation of categorical data. Categorical data is a grouping of data into discrete groups,
such as months of the year, age group, shoe sizes, and animals. In a column bar chart, the categories
appear along the horizontal axis; the height of the bar corresponds to the value of each category.

Exercise. 12 students were given intensive couching and 2 tests were conducted in a different
year. This course of test English & Accounts are given below:
Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
English 35 56 42 35 61 76 48 83 86 93
Accounts 42 65 75 84 64 81 73 63 86 95

Let’s continue to use the data from Exercise 1, and present this data as a bar graph. Since we
already have the data entered, we can go right to the commands to create the bar graph:
Highlight the data. From the ribbon, select the Insert tab > Column> 2D
II.CIRCLE GRAPHS

A circle graph shows the amount of data that belongs to each category as a proportional part
of a circle. Consider Example 1. We are instructed to construct a circle graph, with data
presented as a frequency distribution.
Enter the data (either by hand, or opening the data file.)
Highlight the data. From the ribbon, select the Insert tab > Pie > 2D pie
Lab Assignment 2
Creating Charts in M.S. Excel: Graphic Displays for Qualitative Data

Introduction: Graphically representing data is one of the most helpful ways to become
acquainted with the sample data. In this lab you will use Excel to present data graphically. You
will be analysing data using types of graphs: Circle graphs, Bar graphs

II. BAR CHART


The Bar Chart is like a Column Chart lying on its side. The horizontal axis of a Bar Chart contains the
numeric values. The first chart below is the Bar Chart for our single series, Flowers. When to use a Bar
Chart versus a Column Chart depends on the type of data and user preference. Sometimes it is worth
the time to create both charts and compare the results. However, Bar Charts do tend to display and
compare a large number of series better than the other chart types. A bar chart or bar graph is a chart
with rectangular bars with lengths proportional to the values that they represent. The bars can be plotted
vertically or horizontally. A vertical bar chart is sometimes called a column bar chart. Bar charts provide
a visual presentation of categorical data. Categorical data is a grouping of data into discrete groups,
such as months of the year, age group, shoe sizes, and animals. In a column bar chart, the categories
appear along the horizontal axis; the height of the bar corresponds to the value of each category.

Exercise. 12 students were given intensive couching and 2 tests were conducted in a different
year. This course of test English & Accounts are given below:
Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
English 35 56 42 35 61 76 48 83 86 93
Accounts 42 65 75 84 64 81 73 63 86 95

Let’s continue to use the data from Exercise 1, and present this data as a bar graph. Since we
already have the data entered, we can go right to the commands to create the bar graph:
Highlight the data. From the ribbon, select the Insert tab > Column> 2D
II.CIRCLE GRAPHS

A circle graph shows the amount of data that belongs to each category as a proportional part
of a circle. Consider Example 1. We are instructed to construct a circle graph, with data
presented as a frequency distribution.
Enter the data (either by hand, or opening the data file.)
Highlight the data. From the ribbon, select the Insert tab > Pie > 2D pie
Lab Assignment 3
Exercise: Making a Histogram on Excel

HISTOGRAMS:
Histograms are more useful for large sets of data. A histogram is a common data analysis tool
in the business world. It’s a column chart that shows the frequency of the occurrence of a
variable in the specified range.
A simple example of a histogram is the distribution of marks scored in a subject. You can easily
create a histogram and see how many students scored less than 35, how many were between
35-50, how many between 50-60 and so on. Note: Data continues down the column.

Select the entire dataset > Click the Insert tab> In the Charts group, click on the
‘Insert Static Chart’ option.
Now you can customize this chart by right-clicking on the vertical axis and selecting Format Axis.

Here are some of the things to customize histogram chart:

1. By Category: This option is used when wehave text categories. This could be useful
when you have repetitions in categories and you want to know the sum or count of the
categories.
2. Automatic: This option automatically decides what bins to create in the Histogram.
For example, in our chart, it decided that there should be four bins. You can change
this by using the ‘Bin Width/Number of Bins’ options (covered below).

3. Bin Width: Here you can define how big the bin should be. If I enter 20 here, it will
create bins such as 36-56, 56-76, 76-96, 96-116.
4. Number of Bins: Here you can specify how many bins you want. It will automatically create a
chart with that many bins. For example, if I specify 7 here, it will create a chart as shown
below. At a given point, you can either specify Bin Width or Number of Bins (not both).
5. Overflow Bin: Use this bin if you want all the values above a certain value clubbed together
in the Histogram chart. For example, if I want to know the number of students that have
scored more than 75, I can enter 75 as the Overflow Bin value.
6. Underflow Bin: Similar to Overflow Bin, if I want to know the number of students that have
scored less than 40, I can enter 4o as the value .
Lab Assignment 4
T-test- Introduction &T-test on excel

T-TEST
A t-test is a type of inferential statistic used to determine if there is a significant difference
between the means of two groups, which may be related in certain features. It is mostly used
when the data sets, like the data set recorded as the outcome from flipping a coin 100 times,
would follow a normal distribution and may have unknown variances. A t-test is used as a
hypothesis testing tool, which allows testing of an assumption applicable to a population.

A t-test looks at the t-statistic, the t-distribution values and the degrees of freedom to determine
the probability of difference between two sets of data. To conduct a test with three or more
variables, one must use an analysis of variance.
Two-Sample T-Test Assumptions
The assumptions of the two-sample t-test are:
1. The data are continuous (not discrete).
2. The data follow the normal probability distribution.
3. The variances of the two populations are equal. (If not, the Aspin-Welch Unequal-Variance
test is used.)
4. The two samples are independent. There is no relationship between the individuals in one
sample as compared to the other (as there is in the paired t-test).
5. Both samples are simple random samples from their respective populations. Each individual
in the population has an equal probability of being selected in the sample
T-TEST ON EXCEL
22 persons were appointed in an officer grade in an office their performance is noted by
giving a test and marks are out of 100. They were given 4 months training and test. By
using t-test conclude that employees have benefits by training or not. The data as follows

The First Step Is To Develop The Hypothesis That Is H0 Which Means The Null Hypothesis
And Also, We Have To Make H1 Which Means Alternative Hypothesis.
H0 = The Training Is Not Beneficial
H1 = The Training Is Beneficial

Significance Level: 5%
Selection Of Test: In This Case We Will Use T-Test As There Is Less Than 30 Observations

STEPS ON EXCEL: -
1)Click on data on the menu bar
2)Click on data analysis and then select t-test: paired two sample for means
3)Following dialogue box will appear
4)Fill the input with the cells and select the cell where the output is to be shown
5)Click OK
6)Following is the table you will get
CONCLUSION: -
T CAL VALUE: TAB VALUE
1.42: 2.08
1.42<2.08

AS WE CAN SEE THAT TAB VALUE IS MORE THAN T CALCULATED VALUE, THAN
ACCEPT NULL HYPOTHESIS
Lab Assignment 5
Z-test on excel

Z-Test: Introduction
A z-test is a statistical test used to determine whether two population means are different when
the variances are known and the sample size is large. The test statistic is assumed to have a
normal distribution, and nuisance parameters such as standard deviation should be known in
order for an accurate z-test to be performed.
A z-test is used to compare the mean of a normal random variable to a specified value, μ0. But
don't get hung up on the "normal random variable" part. Z-tests can be used in situations where
the data is generated from other distributions, such as binomial and Poisson. This is thanks to
properties of maximum likelihood estimators.

Z-TEST ON EXCEL
34 persons were appointed in an officer grade in an office their performance is noted by
giving a test and marks are out of 100. They were given 4 months training and test. By
using z-test conclude that employees have benefits by training or not. The data as follow
THE FIRST STEP IS TO DEVELOP THE HYPOTHESIS THAT IS H0 WHICH MEANS
THE NULL HYPOTHESIS AND ALSO, WE HAVE TO MAKE H1 WHICH MEANS
ALTERNATIVE HYPOTHESIS.

H0 = THE TRAINING IS NOT BENEFITIAL


H1 = THE TRAINING IS BENEFICIAL

SIGNIFICANCE LEVEL: 5%
SELECTION OF TEST: IN THIS CASE WE WILL USE Z-TEST AS THERE IS MORE
THAN 30 OBSERVATIONS

STEPS ON EXCEL: -
1)Click on data on the menu bar
2)Click on data analysis and then select z-test: two sample for means
3)Following dialogue box will appear

4)Fill the input with the cells and select the cell where the output is to be shown
5)Click OK
6)Following is the table you will get
CONCLUSION: -
T CAL VALUE: TAB VALUE
-1.98: 1.95
-1.98<1.95
AS WE CAN SEE THAT TAB VALUE IS MORE THAN T CALCULATED VALUE,
THEN ACCEPT NULL HYPOTHESIS
Exercise: Making a Histogram on Excel

HISTOGRAMS:
Histograms are more useful for large sets of data. A histogram is a common data analysis tool
in the business world. It’s a column chart that shows the frequency of the occurrence of a
variable in the specified range.
A simple example of a histogram is the distribution of marks scored in a subject. You can easily
create a histogram and see how many students scored less than 35, how many were between
35-50, how many between 50-60 and so on. Note: Data continues down the column.

Select the entire dataset > Click the Insert tab> In the Charts group, click on the
‘Insert Static Chart’ option.
Now you can customize this chart by right-clicking on the vertical axis and selecting Format Axis.

Here are some of the things to customize histogram chart:

7. By Category: This option is used when wehave text categories. This could be useful
when you have repetitions in categories and you want to know the sum or count of the
categories.
8. Automatic: This option automatically decides what bins to create in the Histogram.
For example, in our chart, it decided that there should be four bins. You can change
this by using the ‘Bin Width/Number of Bins’ options (covered below).

9. Bin Width: Here you can define how big the bin should be. If I enter 20 here, it will
create bins such as 36-56, 56-76, 76-96, 96-116.
10. Number of Bins: Here you can specify how many bins you want. It will automatically create a
chart with that many bins. For example, if I specify 7 here, it will create a chart as shown
below. At a given point, you can either specify Bin Width or Number of Bins (not both).
11. Overflow Bin: Use this bin if you want all the values above a certain value clubbed together
in the Histogram chart. For example, if I want to know the number of students that have
scored more than 75, I can enter 75 as the Overflow Bin value.
12. Underflow Bin: Similar to Overflow Bin, if I want to know the number of students that have
scored less than 40, I can enter 4o as the value .
Lab Assignment 3
Calculating the Pearson Correlation Coefficient with Excel

Correlation means association - more precisely it is a measure of the extent to which two
variables are related. There are three possible results of a correlational study: a positive
correlation, a negative correlation, and no correlation.
A positive correlation is a relationship between two variables in which both variables
either increase or decrease at the same time. An example would be height and weight. Taller
people tend to be heavier.
A negative correlation is a relationship between two variables in which an increase in one
variable is associated with a decrease in the other. An example would be height above sea
level and temperature. As you climb the mountain (increase in height) it gets colder (decrease
in temperature).
A zero correlation exists when there is no relationship between two variables. For example
there is no relationship between the amount of tea drunk and level of intelligence.
A correlation can be expressed visually. This is done by drawing a scatter gram - that is one
can plot the figures for one variable against the figures for the other on a graph.
Some uses of Correlations: Prediction, Validity. Reliability, Theory verification
Predictive validity.
Correlation coefficient(s) with Excel
1.Datatab→Data Analysis→ Correlation (or Insert Function – Statistical)
2. Highlight all the columns containing variables you suspect are correlated.
3. Check Labels in First Row only if you highlighted the top row labels.
4. Click the Output Range: and select a place for the output by clicking on a cell. It will create
annxn array wheren is the number of variables (columns).
5. Click OK and this will produce an array of correlation coefficients between all of the
variables represented by the columns. It’s good for seeing which of many variables are most
strongly correlated.
Some uses of Correlations: Prediction, Validity. Reliability, Theory verification
Predictive validity
Lab Assignment 4
T-test- Introduction &T-test on excel

T-TEST
A t-test is a type of inferential statistic used to determine if there is a significant difference
between the means of two groups, which may be related in certain features. It is mostly used
when the data sets, like the data set recorded as the outcome from flipping a coin 100 times,
would follow a normal distribution and may have unknown variances. A t-test is used as a
hypothesis testing tool, which allows testing of an assumption applicable to a population.

A t-test looks at the t-statistic, the t-distribution values and the degrees of freedom to determine
the probability of difference between two sets of data. To conduct a test with three or more
variables, one must use an analysis of variance.
Two-Sample T-Test Assumptions
The assumptions of the two-sample t-test are:
1. The data are continuous (not discrete).
2. The data follow the normal probability distribution.
3. The variances of the two populations are equal. (If not, the Aspin-Welch Unequal-Variance
test is used.)
4. The two samples are independent. There is no relationship between the individuals in one
sample as compared to the other (as there is in the paired t-test).
5. Both samples are simple random samples from their respective populations. Each individual
in the population has an equal probability of being selected in the sample
T-TEST ON EXCEL
22 persons were appointed in an officer grade in an office their performance is noted by
giving a test and marks are out of 100. They were given 4 months training and test. By
using t-test conclude that employees have benefits by training or not. The data as follows

The First Step Is To Develop The Hypothesis That Is H0 Which Means The Null Hypothesis
And Also, We Have To Make H1 Which Means Alternative Hypothesis.
H0 = The Training Is Not Beneficial
H1 = The Training Is Beneficial

Significance Level: 5%
Selection Of Test: In This Case We Will Use T-Test As There Is Less Than 30 Observations

STEPS ON EXCEL: -
1)Click on data on the menu bar
2)Click on data analysis and then select t-test: paired two sample for means
3)Following dialogue box will appear
4)Fill the input with the cells and select the cell where the output is to be shown
5)Click OK
6)Following is the table you will get
CONCLUSION: -
T CAL VALUE: TAB VALUE
1.42: 2.08
1.42<2.08

AS WE CAN SEE THAT TAB VALUE IS MORE THAN T CALCULATED VALUE, THAN
ACCEPT NULL HYPOTHESIS
Lab Assignment 5
Z-test on excel

Z-Test: Introduction
A z-test is a statistical test used to determine whether two population means are different when
the variances are known and the sample size is large. The test statistic is assumed to have a
normal distribution, and nuisance parameters such as standard deviation should be known in
order for an accurate z-test to be performed.
A z-test is used to compare the mean of a normal random variable to a specified value, μ0. But
don't get hung up on the "normal random variable" part. Z-tests can be used in situations where
the data is generated from other distributions, such as binomial and Poisson. This is thanks to
properties of maximum likelihood estimators.

Z-TEST ON EXCEL
34 persons were appointed in an officer grade in an office their performance is noted by
giving a test and marks are out of 100. They were given 4 months training and test. By
using z-test conclude that employees have benefits by training or not. The data as follow
THE FIRST STEP IS TO DEVELOP THE HYPOTHESIS THAT IS H0 WHICH MEANS
THE NULL HYPOTHESIS AND ALSO, WE HAVE TO MAKE H1 WHICH MEANS
ALTERNATIVE HYPOTHESIS.

H0 = THE TRAINING IS NOT BENEFITIAL


H1 = THE TRAINING IS BENEFICIAL

SIGNIFICANCE LEVEL: 5%
SELECTION OF TEST: IN THIS CASE WE WILL USE Z-TEST AS THERE IS MORE
THAN 30 OBSERVATIONS

STEPS ON EXCEL: -
1)Click on data on the menu bar
2)Click on data analysis and then select z-test: two sample for means
3)Following dialogue box will appear

4)Fill the input with the cells and select the cell where the output is to be shown
5)Click OK
6)Following is the table you will get
CONCLUSION: -
T CAL VALUE: TAB VALUE
-1.98: 1.95
-1.98<1.95
AS WE CAN SEE THAT TAB VALUE IS MORE THAN T CALCULATED VALUE,
THEN ACCEPT NULL HYPOTHESIS

You might also like