You are on page 1of 30

STASTICAL

DATA
ANALYSIS
A Lokeshwari
22N31E0014
DEFINE CORRELATION
Correlation is a statistical measure that shows the strength of the relationship
between two variables. A correlation coefficient is a number between -1 and 1
that shows the direction and strength of the relationship. A correlation
coefficient of 1 indicates a perfect positive correlation, a correlation coefficient
of -1 indicates a perfect negative correlation, and a correlation coefficient of 0
indicates no correlation.
THERE ARE TWO MAIN
TYPES OF CORRELATION
Linear correlation: This type of correlation measures the strength of the linear relationship
between two variables. A linear relationship is a relationship where the points on a scatter plot
form a straight line.

Non-linear correlation: This type of correlation measures the strength of the non-linear
relationship between two variables. A non-linear relationship is a relationship where the points
on a scatter plot do not form a straight line.
TO CALCULATE
CORRELATION USING
EXCEL
 We can use the CORREL function. The CORREL function takes two arrays as input and
returns a correlation coefficient. For example, the following code would calculate the
correlation coefficient between the variables "height" and "weight" in a dataset

 =CORREL(height, weight)

 The CORREL function is a powerful tool that can be used to analyze the relationship between
two variables. However, it is important to remember that correlation does not imply causation.
Just because two variables are correlated does not mean that one causes the other.
EXAMPLES OF
CORRELATION
 The height and weight of adults are positively correlated. This means that as people get taller,
they tend to get heavier.
 The price of gasoline and the number of miles driven are negatively correlated. This means
that as the price of gasoline goes up, people tend to drive less.
 The number of hours students study and their grades are positively correlated. This means that
as students study more, they tend to get better grades.
 create a data with 10 rows and 25 columns and sort the data as ascending descending apply
filter then secure your workbook sheet

 steps on how to create a data with 10 rows and 25 columns, sort the data in ascending and
descending order, apply filter, and secure the workbook shee
 Open Excel.
 Create a new workbook.
 In the first row, enter the column headers.
 In the remaining rows, enter random data.
 To sort the data in ascending order, click on the Data tab and then click on the Sort & Filter
button.
 In the Sort dialog box, select the column that you want to sort by and then click on the
Ascending option.
 To sort the data in descending order, click on the Descending option.
 To apply a filter, click on the Data tab and then click on the Filter button.
 In the Filter dialog box, select the column that you want to filter by and then select the criteria
that you want to use.
 To secure the workbook sheet, click on the File tab and then click on the Protect Sheet button.

 In the Protect Sheet dialog box, enter a password and then click on the OK button.

 Once you have created the data, you can sort it in ascending or descending order by clicking on
the Data tab and then clicking on the Sort & Filter button. You can also apply a filter by clicking
on the Data tab and then clicking on the Filter button. To secure the workbook sheet, click on the
File tab and then click on the Protect Sheet button.
 3 here are the steps on how to create different pivot tables from the above created
data and apply different colors to them:

 Select the data that you want to create a pivot table from.

 Click on the Insert tab and then click on the PivotTable button.

 In the PivotTable dialog box, select the Existing sheet option and then select the
sheet that you want to create the pivot table on.
 Click on the OK button.

 In the pivot table, drag the column headers that you want to summarize to the
Rows, Columns, or Values areas.

 To apply different colors to the pivot table, right-click on any cell in the pivot table
and then click on the PivotTable Styles option.

 In the PivotTable Styles dialog box, select the style that you want to use and then
click on the OK button.
 A pivot table that summarizes the data by column.

 A pivot table that summarizes the data by row.

 A pivot table that summarizes the data by value.

 A pivot table that summarizes the data by multiple criteria.

 Once you have created the pivot tables, you can apply different colors to them to make
them easier to read and understand.
 Here are some tips for applying different colors to pivot tables:

 Use different colors for different types of data.

 Use different shades of the same color to create a sense of hierarchy.

 Use bright colors to highlight important data.

 Use muted colors to de-emphasize less important data by selecting fill


colour’s
 create the data for 10 rows and 10 coloumns with 10 different variables identify the
correlation fot he created data
 here are the steps on how to create a data for 10 rows and 10 columns with 10 different
variables and identify the correlation for the created data with Excel:
 Open Excel.
 Create a new workbook.
 In the first row, enter the column headers.
 In the remaining rows, enter random data.
 To calculate the correlation, select the data that you want to calculate the correlation for.
 Click on the Data tab and then click on the Correlation button.
 In the Correlation dialog box, select the Pearson correlation coefficient option and then click
on the OK button.
 The correlation coefficients will be displayed in a table.
 Once you have created the data, you can calculate the correlation by selecting the data and
then clicking on the Data tab and then clicking on the Correlation button. The correlation
coefficients will be displayed in a table
 Here are some tips for interpreting the correlation coefficients:

 A correlation coefficient of 1 indicates a perfect positive correlation.


 A correlation coefficient of -1 indicates a perfect negative correlation.
 A correlation coefficient of 0 indicates no correlation.
 what is mean median mode and from global super store2016 data show case the answer

 Mean, median, and mode are three measures of central tendency used in statistics to describe a
set of data.
 Mean: The mean, also known as the average, is calculated by summing all the values in a
dataset and dividing the sum by the number of data points. It represents the "typical" value of
the data.
 Median: The median is the middle value in a dataset when the data is arranged in ascending or
descending order. If the dataset has an odd number of data points, the median is the middle
value. If the dataset has an even number of data points, the median is the average of the two
middle values.
 Mode: The mode is the value that occurs most frequently in a dataset. It represents the most
common value or category in the data.
 To showcase the mean, median, and mode, let's use a subset of the Global Superstore 2016
data. Assume we have the following dataset representing the sales of a specific product:

 {100, 150, 200, 250, 300, 300, 350}


 Mean:
 Mean = (100 + 150 + 200 + 250 + 300 + 300 + 350) / 7
 Mean = 1750 / 7
 Mean =250.
 Median:
 Since there are 7 data points, the median is the fourth value, which is 250.
 Mode:
 The mode is the value that appears most frequently. In this dataset, the mode is 300
as it appears twice, more than any other value.
 So, for this subset of the Global Superstore 2016 data, the mean is approximately
250, the median is 250, and the mode is 300.
 3 how could we present standard deviation variance covariance dispersion to
the above mentioned dataset
 To present standard deviation, variance, covariance, and dispersion for the
given dataset {100, 150, 200, 250, 300, 300, 350}, follow these steps
 Variance:
 The variance measures how far the data points in the dataset are spread out
from the mean. It is calculated as the average of the squared differences
between each data point and the mean.
 Variance = [(100 - Mean)^2 + (150 - Mean)^2 + (200 - Mean)^2 + (250 - Mean)^2 + (300 -
Mean)^2 + (300 - Mean)^2 + (350 - Mean)^2] / Number of data points
 Using the calculated mean (250) from the previous response:
 Variance = [(100 - 250)^2 + (150 - 250)^2 + (200 - 250)^2 + (250 - 250)^2 + (300 - 250)^2 +
(300 - 250)^2 + (350 - 250)^2] / 7
 Variance ≈ 7083.33
 Standard Deviation:
 The standard deviation is the square root of the variance. It measures the average amount of
variation or dispersion in the dataset.
 Standard Deviation = √Variance ≈ √7083.33 ≈ 84.17
 Covariance:
 The covariance measures the degree to which two variables (datasets)
change together. It is used to determine the relationship between two
datasets.
 Covariance between two datasets X and Y = Σ [(Xi - Mean of X) * (Yi -
Mean of Y)] / Number of data points
 If we consider the dataset {100, 150, 200, 250, 300, 300, 350} as both X
and Y (as they are the same for simplicity):
 Covariance = [(100 - 250) * (100 - 250) + (150 - 250) * (150 - 250) + ... + (350 -
250) * (350 - 250)] / 7
 Covariance ≈ 1666.67
 Dispersion:
 Dispersion refers to the extent of variability or spread of data in a dataset. The
standard deviation calculated earlier is one way to measure dispersion. Another
common measure of dispersion is the range, which is the difference between the
largest and smallest values in the dataset.
 Range = Largest value - Smallest value
 Range = 350 - 100 = 250
 To summarize:
 Variance =7083.33
 Standard Deviation =84.17
 Covariance =1666.67
 Range =250
 These measures provide insights into the spread and relationships within
the dataset.

You might also like