You are on page 1of 28

RESEARCH METHODOLOGY

(Manual Statistical Analysis)

UNDER THE GUIDANCE OF:

Mrs. Nilima Regina Topno (Associate Professor)

SUBMITTED BY:

Anand kumar

Chanpreet singh

Khushboo kumari

Sonal

DEPARTMENT OF FASHION TECHNOLOGY 2017-2021

NIFT PATNA

1
ACKNOWLEDGEMENT

We would like to thank National Institute of Fashion Technology for giving me the
opportunity to take up the subject like Research methodology and Data analytics & R .

Foremost, we would like to thank our subject faculty, Mrs. Nilima Regina Topno, for giving
their invaluable feedback and guidance throughout the semester. This assignment could not
have been achieved without her support.

We acknowledge with a deep sense of gratitude, the encouragement and inspiration received
from our faculty providing us relevant information.

¯ ANAND KUMAR
¯ CHANPREET
SINGH
¯ KHUSHBOO
KUMARI
¯ SONAL

2
OBJECTIVE
Manual calculations for sampling, mean, median, mode, range, anova, hypothesis testing,
regression line scaling, chi square test, bell curve test and other manual methods for the
research that have done on the previous iris data.

CRITERIA OF SELECTING A SAMPLING PROCEDURE

There are two costs are involved in a sampling analysis viz., the cost of collecting the data
and the cost of an incorrect inference resulting from the data. Researcher must keep in view
the two causes of incorrect inferences viz., systematic bias and sampling error. A systematic
bias results from errors in the sampling procedures, and it cannot be reduced or eliminated by
increasing the sample size. At best the causes responsible for these errors can be detected and
corrected. Usually a systematic bias is the result of one or more of the following factors:

1. Inappropriate sampling frame: If the sampling frame is inappropriate i.e., a biased


representation of the universe, it will result in a systematic bias.

2. Defective measuring device: If the measuring device is constantly in error, it will result in
systematic bias. In survey work, systematic bias can result if the questionnaire or the
interviewer is biased. Similarly, if the physical measuring device is defective there will be
systematic bias in the data collected through such a measuring device.

3. Non-respondents: If we are unable to sample all the individuals initially included in the
sample, there may arise a systematic bias. The reason is that in such a situation the likelihood
of establishing contact or receiving a response from an individual is often correlated with the
measure of what is to be estimated.

4. Indeterminancy principle: Sometimes we find that individuals act differently when kept
under observation than what they do when kept in non-observed situations. For instance, if
workers are aware that somebody is observing them in course of a work study on the basis of
which the average length of time to complete a task will be determined and accordingly the
quota will be set for piece work, they generally tend to work slowly in comparison to the
speed with which they work if kept unobserved. Thus, the indeterminancy principle may also
be a cause of a systematic bias.

5. Natural bias in the reporting of data: Natural bias of respondents in the reporting of data is
often the cause of a systematic bias in many inquiries. There is usually a downward bias in
the income data collected by government taxation department, whereas we find an upward
bias in the income data collected by some social organisation. People in general understate
their incomes if asked about it for tax purposes, but they overstate the same if asked for social
status or their affluence. Generally in psychological surveys, people tend to give what they
think is the ‘correct’ answer rather than revealing their true feelings.

Sampling errors are the random variations in the sample estimates around the true
population parameters. Since they occur randomly and are equally likely to be in either
direction, their nature happens to be of compensatory type and the expected value of such

3
errors happens to be equal to zero. Sampling error decreases with the increase in the size of
the sample, and it happens to be of a smaller magnitude in case of homogeneous population.

COMPLEX RANDOM SAMPLING DESIGN (applied sampling)

Systematic sampling
Systematic sampling: The most practical way of sampling is to select every ith item on a list.
Sampling of this type is known as systematic sampling. An element of randomness is
introduced into this kind of sampling by using random numbers to pick up the unit with
which to start. For instance, if a 4 per cent sample is desired, the first item would be selected
randomly from the first twenty-five and thereafter every 25th item would automatically be
included in the sample. Thus, in systematic sampling only the first unit is selected randomly
and the remaining units of the sample are selected at fixed intervals. Although a systematic
sample is not a random sample in the strict sense of the term, but it is often considered
reasonable to treat systematic sample as if it were a random sample.

Systematic sampling has certain plus points. It can be taken as an improvement over a
simple random sample in as much as the systematic sample is spread more evenly over the
entire population. It is an easier and less costly method of sampling and can be conveniently
used even in case of large populations.

But there are certain dangers too in using this type of sampling. If there is a hidden
periodicity in the population, systematic sampling will prove to be an inefficient method of
sampling. For instance, every 25th item produced by a certain production process is
defective. If we are to select a 4% sample of the items of this process in a systematic manner,
we would either get all defective items or all good items in our sample depending upon the
random starting position. If all elements of the universe are ordered in a manner
representative of the total population, i.e., the population list is in random order, systematic
sampling is considered equivalent to random sampling. But if this is not so, then the results of
such sampling may, at times, not be very reliable.

In practice, systematic sampling is used when lists of population are available and they are of
considerable length.

Systematic sampling :15th item from iris data

Sepal length Sepal length Petal length Petal width Species


5.1 3.5 1.4 0.2 Setosa

5.8 4 1.2 0.2 Setosa

4.7 3.2 1.6 0.2 Setosa

5.1 3.8 1.9 0.4 Setosa

5.2 2.7 3.9 1.4 Versicol


or

4
6.4 2.9 4.3 1.3 Versicol
or
5.5 2.5 4 1.3 Versicol
or
6.5 3 5.8 2.2 Virginica
6 2.2 5 1.5 Virginica
6.1 2.6 5.6 1.4 Virginica
5.9 3 5.1 1.8 Virginica

Mean, median, mode, standard deviation, range


s. no. sepal sepal Petal Petal
length width length width
1 5.1 3.5 1.4 0.2
2 5.8 4 1.2 0.2
3 4.7 3.2 1.6 0.2
4 5.1 3.8 1.9 0.4
5 5.2 2.7 3.9 1.4
6 6.4 2.9 4.3 1.3
7 5.5 2.5 4 1.3
8 6.5 3 5.8 2.2
9 6 2.2 5 1.5
10 6.1 2.6 5.6 1.4
11 5.9 3 5.1 1.8
Mean = Sum 5.663636
of all the set 364
elements /
Number of 3.036363 3.618181 1.08181
elements 636 818 8182
Median - 5.8
Middle value
separating the
greater and
lesser halves of
a data set 3 4 1.3
Mode - Most 5.1
frequent value
in a data set
3 #N/A 0.2
Standard 0.585273
deviation- it is 829
a number used
to tell how
measurements
for a group are
spread out from
the average or 0.553665 1.768512 0.70967
expected value. 472 472 342

5
Range – 1.7
differences
between lowest
and highest
value. 1.8 4.6 2

6
Hypothesis-testing
Hypothesis-testing: After analyzing the data , the researcher is in a position to test the
hypotheses. Do the facts support the hypotheses or they happen to be contrary? This is the
usual question which should be answered while testing hypotheses. Various tests, such as
Chi square test, t-test, F-test, have been developed by statisticians for the purpose. The
hypotheses may be tested through the use of one or more of such tests, depending upon the
nature and object of research inquiry. Hypothesis-testing will result in either accepting the
hypothesis or in rejecting it. If the researcher had no hypotheses to start with, generalisations
established on the basis of data may be stated as hypotheses to be tested by subsequent
researches in times to come.

Hypothesis Testing - Analysis of Variance (ANOVA)

Illustration of ANOVA procedure using the five step approach. Because the computation of
the test statistic is involved, the computations are often organized in an ANOVA table. The
ANOVA table breaks down the components of variation in the data into variation between
treatments and error or residual variation. Statistical computing packages also produce
ANOVA tables as part of their standard output for ANOVA, and the ANOVA table is set up as
follows:

Degrees
Source of of Mean
Sums of Squares (SS) F
Variation Freedom Squares (MS)
(df)

Between
k-1
Treatments

Error (or
N-k
Residual)

Total N-1

where

X = individual observation,

= sample mean of the jth treatment (or group),

= overall sample mean,

7
k = the number of treatments or independent comparison groups, and

N = total number of observations or total sample size.

The ANOVA table above is organized as follows.

The first column is entitled "Source of Variation" and delineates the between treatment and
error or residual variation. The total variation is the sum of the between treatment and error
variation.

The second column is entitled "Sums of Squares (SS)". The between treatment sums of
squares is

and is computed by summing the squared differences between each treatment (or group)
mean and the overall mean. The squared differences are weighted by the sample sizes per
group (nj). The error sums of squares are:

and is computed by summing the squared differences between each observation and its group
mean (i.e., the squared differences between each observation in group 1 and the group 1
mean, the squared differences between each observation in group 2 and the group 2 mean,
and so on). The double summation ( SS ) indicates summation of the squared differences
within each treatment and then summation of these totals across treatments to produce a
single value. (This will be illustrated in the following examples). The total sums of squares is:

and is computed by summing the squared differences between each observation and the
overall sample mean. In an ANOVA, data are organized by comparison or treatment groups.
If all of the data were pooled into a single sample, SST would reflect the numerator of the
sample variance computed on the pooled or total sample. SST does not figure into the F
statistic directly. However, SST = SSB + SSE, thus if two sums of squares are known, the
third can be computed from the other two.

The third column contains degrees of freedom. The between treatment degrees of freedom is
df1 = k-1. The error degrees of freedom is df 2 = N - k. The total degrees of freedom is N-1
(and it is also true that (k-1) + (N-k) = N-1).

8
The fourth column contains "Mean Squares (MS)" which are computed by dividing sums of
squares (SS) by degrees of freedom (df), row by row. Specifically, MSB=SSB/(k-1) and
MSE=SSE/(N-k). Dividing SST/(N-1) produces the variance of the total sample. The F
statistic is in the rightmost column of the ANOVA table and is computed by taking the ratio
of MSB/MSE.

In iris data, goal is to determine that each group’s mean value is statistically different from
the others, and to do this, can evaluate the variability between each of the mean value.

It can be run through ANOVA using the five-step approach.

Setos versicolor Verginica


5.1 6.5
5.2
5.8 6
6.4
4.7 6.1
5.5
5.1 5.9

Step 1. Set up hypotheses and determine level of significance

H0: μ1 = μ2 = μ3

H1: Means are not all equal α=0.05

Step 2. Select the appropriate test statistic.

The test statistic is the F statistic for ANOVA, F=MSB/MSE.

Step 3. Set up decision rule.

The appropriate critical value can be found in a table of probabilities for the F
distribution(see "Other Resources"). In order to determine the critical value of F we need
degrees of freedom, df1=k-1 and df2=N-k. In this example, df1=k-1=3-1=2 and df2=N-k=11-
3=8. The critical value is 4.459 and the decision rule is as follows: Reject H0 if F > 4.459.

Step 4. Compute the test statistic.

To organize our computations we complete the ANOVA table. In order to compute the sums
of squares we must first compute the sample means for each group and the overall mean
based on the total sample.

Group 1 Group 2 Group 3

9
Sample Size 4 3 4

Sample Mean 5.175 5.7 6.125

Sample Standard Deviation .4573 .6244 .2629

If we pool all N=11 observations, the overall mean is = 17/3 = 5.6666

We can now compute

So, in this case:

SSB=3*(5.175-5.6666)2+3*(5.7-5.6666)2+3*(6.125-5.6666)2

SSB=1.3581

Next we compute,

SSE requires computing the squared differences between each observation and its group
mean. We will compute SSE in parts. For the sepal length for setos:

Setos (X - 5.175 ) (X - 5.175)2


5.1 -0.075 0.0056

5.8 0.625 0.3906

4.7 -0.475 0.2256

5.1 -0.075 0.0056


Totals 0 0.6274

Thus, 0.6274

10
For the sepal length of versicolor :

versicolor (X –5.7) (X - 5.7)2

5.2 -0.5 0.25

6.4 0.7 0.49

5.5 -0.2 0.04


Totals 0 0.78

Thus, 0.78

For the sepal length of verginica:

verginica: (X - 6.125) (X - 6.125)2

6.5 0.375 0.1406

6 -0.125 0.0156

6.1 -0.025 0.0006

5.9 -0.225 0.0506


Totals 0 0.2074

Thus, 0.2074

Therefore, 0.6274 + 0.78 + 0.2074 = 1.6148

11
We can now construct the ANOVA table.

Degrees
Sums of
Source of Squares of Means Squares
Freedom F
Variation (MS)
(SS)
(df)

Between 0.6790/0.2018
1.3581 3-1=2 1.3581/2=0.6790
Treatmenst = 3.3647

1.6148
Error (or
11-3=8 1.6148/8=0.2018
Residual)

Total 2.9729 11-1=10

Step 5. Conclusion.

We reject H0 because 3.3647 > 3.24. We have statistically significant evidence at α=0.05 to
show that there is a difference in mean sepal length among the three species.

12
Linear regression
It explains a quantitative variable by one or more quantitative variable, which is equivalent to
defining lines between quantitative variables.

How to find a linear regression equation: steps

Species Sepal Petal


length length
Setosa 5.1
1.4
Setosa 5.8
1.2
Setosa 4.7
1.6
Setosa 5.1
1.9
Versicolor 5.2
3.9
Versicolor 6.4
4.3
Versicolor 5.5
4
Virginica 6.5
5.8
Virginica 6
5
Virginica 6.1
5.6
Virginica 5.9
5.1
Total 62.3
39.8
Mean 5.6636
3.6181

scatter diagram to show sepal and petal length spaces in species

4 Petal length
3

0
4.6 4.8 5 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6

13
Each point represents an (x,y) pair. Note that the independent variable is on the horizontal
axis (or X-axis), and the dependent variable is on the vertical axis (or Y-axis). The scatter
plot shows a positive or direct association between sepal length and petal length.

The formula for the sample correlation coefficient is

Where, Cov(x,y) is the covariance of x and y defined as

are the sample variances of x and y, defined as

The variances of x and y measure the variability of the x scores and y scores around their
respective sample means (

, considered separately). The covariance measures the variability of the (x,y) pairs
around the mean of x and mean of y, considered simultaneously.

To compute the sample correlation coefficient, we need to compute the variance of sepal
length, the variance of petal length and also the covariance of sepal length and petal length.

We first summarize the sepal length data. The mean sepal length is:

62.3/11= 5.6636

To compute the variance of sepal length, we need to sum the squared deviations (or
differences) between each observed sepal length and the mean sepal length. The
computations are summarized below.

14
Species Sepal length
x minus x bar square
Setosa 5.1 0.3176
-0.5636 45
Setosa 5.8 0.0186
0.1364 05
Setosa 4.7 0.9285
-0.9636 25
Setosa 5.1 0.3176
-0.5636 45
Versicolor 5.2 0.2149
-0.4636 25
Versicolor 6.4 0.5422
0.7364 85
Versicolor 5.5 0.0267
-0.1636 65
Virginica 6.5 0.6995
0.8364 65
Virginica 6 0.1131
0.3364 65
Virginica 6.1 0.1904
0.4364 45
Virginica 5.9 0.0558
0.2364 85
Total 62.3 3.4254
0.0004 55

The variance of gestational age is:

3.425455/10 = 0.3425455

Next, we summarize the petal length data. The mean petal length is:

39.8/11 = 3.6181

To compute the variance of petal length, we need to sum the squared deviations (or
differences) between each observed petal length and the mean petal length. The
computations are summarized below.

15
Species Petal y minus
length ybar square
Setosa
1.4 -2.2181 4.91996761
Setosa
1.2 -2.4181 5.84720761
Setosa
1.6 -2.0181 4.07272761
Setosa
1.9 -1.7181 2.95186761
Versicolor
3.9 0.2819 0.07946761
Versicolor
4.3 0.6819 0.46498761
Versicolor
4 0.3819 0.14584761
Virginica
5.8 2.1819 4.76068761
Virginica
5 1.3819 1.90964761
Virginica
5.6 1.9819 3.92792761
Virginica
5.1 1.4819 2.19602761
Total 31.2763637
39.8 0.0009 1
Mean
3.6181

The variance of Petal length is:

31.27636371/10 = 3.1276

Next we compute the covariance,

To compute the covariance of sepal length and petal length, we need to multiply the
deviation from the mean sepal length by the deviation from the mean petal length for each
participant (i.e.,

The computations are summarized below.

16
17
Species y minus
x minus x bar ybar multile
Setosa
-0.5636 -2.2181 1.25012116
Setosa
0.1364 -2.4181 -0.32982884
Setosa
-0.9636 -2.0181 1.94464116
Setosa
-0.5636 -1.7181 0.96832116
Versicolor
-0.4636 0.2819 -0.13068884
Versicolor
0.7364 0.6819 0.50215116
Versicolor
-0.1636 0.3819 -0.06247884
Virginica
0.8364 2.1819 1.82494116
Virginica
0.3364 1.3819 0.46487116
Virginica
0.4364 1.9819 0.86490116
Virginica
0.2364 1.4819 0.35032116
Total
7.64727276

The covariance of sepal length and petal length is:

= 7.64727276 /10 = 0.764727

We now compute the sample correlation coefficient:

= 0.764727 /1.0349

= 0.7389

The sample correlation coefficient indicates a strong positive correlation.

As sample correlation coefficients range from -1 to +1. In practice, meaningful correlations


can be as small as 0.4 (or -0.4) for positive (or negative) associations. There are also
statistical tests to determine whether an observed correlation is statistically significant or not
(i.e., statistically significantly different from zero).

18
sepal petal
length length 7
5.1 1.4
5.8 1.2 6

4.7 1.6 5
5.1 1.9
4
5.2 3.9 sepal length
6.4 4.3 3 petal length

5.5 4
2
6.5 5.8
6 5 1

6.1 5.6 0
5.9 5.1 1 2 3 4 5 6 7 8 9 10 11

sepal petal 4.5


width width
4
3.5 0.2
4 0.2 3.5

3.2 0.2 3

3.8 0.4 2.5


sepal width
2.7 1.4 2 petal width
2.9 1.3
1.5
2.5 1.3
1
3 2.2
0.5
2.2 1.5
0
2.6 1.4
1 2 3 4 5 6 7 8 9 10 11
3 1.8

19
4.5

3.5

2.5
sepal width
2 petal width
1.5

0.5

0
1 2 3 4 5 6 7 8 9 10 11

4
sepal length
3 petal length

0
1 2 3 4 5 6 7 8 9 10 11

20
Bell Curve
A bell curve is a common type of distribution for a variable, also known as the normal
distribution. The term "bell curve" originates from the fact that the graph used to depict a
normal distribution consists of a symmetrical bell-shaped curve.
The highest point on the curve, or the top of the bell, represents the most probable event in a
series of data (its mean, mode, and median in this case), while all other possible occurrences
are symmetrically distributed around the mean, creating a downward-sloping curve on each
side of the peak. The width of the bell curve is described by its standard deviation.
The term "bell curve" is used to describe a graphical depiction of a normal probability
distribution, which’s underlying standard deviations from the mean create the curved bell
shape. A standard deviation is a measurement used to quantify the variability of data
dispersion, in a set of given values around the mean. The mean, in turn, refers to the average
of all data points in the data set or sequence and will be found at the highest point on the bell
curve.
Here, the formula is from -3 to 3 in 0.1 increments.
Note: more increments =more smoother
Bell curve for petal length
mean of standa
petal rd
length deviati
on
3.6181 1.7685
standar OUR NORMALIZATI
d DATA ON
deviatio
n
increme
nt
-3 - 0.002505993
1.6874
-2.9 - 0.003365865
1.5105
5
-2.8 - 0.0044758
1.3337
-2.7 - 0.005892527
1.1568
5
-2.6 -0.98 0.007680503
-2.5 - 0.009911394
0.8031
5
-2.4 - 0.012663008
0.6263
-2.3 - 0.01601755
0.4494
5

21
-2.2 - 0.020059142
0.2726
-2.1 - 0.024870566
0.0957
5
-2 0.0811 0.030529243
-1.9 0.2579 0.037102525
5
-1.8 0.4348 0.044642442
-1.7 0.6116 0.05318014
5
-1.6 0.7885 0.062720291
-1.5 0.9653 0.073235847
5
-1.4 1.1422 0.084663537
-1.3 1.3190 0.096900533
5
-1.2 1.4959 0.109802689
-1.1 1.6727 0.12318472
5
-1 1.8496 0.136822575
-0.9 2.0264 0.150458157
5
-0.8 2.2033 0.163806363
-0.7 2.3801 0.176564282
5
-0.6 2.557 0.188422167
-0.5 2.7338 0.199075672
5
-0.4 2.9107 0.2082387
-0.3 3.0875 0.215656101
5
-0.2 3.2644 0.221115462
-0.1 3.4412 0.224457194
5
1.52656 3.6181 0.22558229
E-15
0.1 3.7949 0.224457194
5
0.2 3.9718 0.221115462
0.3 4.1486 0.215656101
5
0.4 4.3255 0.2082387
0.5 4.5023 0.199075672
5
0.6 4.6792 0.188422167
0.7 4.8560 0.176564282
5
0.8 5.0329 0.163806363
0.9 5.2097 0.150458157
5
1 5.3866 0.136822575
1.1 5.5634 0.12318472

22
5
1.2 5.7403 0.109802689
1.3 5.9171 0.096900533
5
1.4 6.094 0.084663537
1.5 6.2708 0.073235847
5
1.6 6.4477 0.062720291
1.7 6.6245 0.05318014
5
1.8 6.8014 0.044642442
1.9 6.9782 0.037102525
5
2 7.1551 0.030529243
2.1 7.3319 0.024870566
5
2.2 7.5088 0.020059142
2.3 7.6856 0.01601755
5
2.4 7.8625 0.012663008
2.5 8.0393 0.009911394
5
2.6 8.2162 0.007680503
2.7 8.3930 0.005892527
5
2.8 8.5699 0.0044758
2.9 8.7467 0.003365865
5
3 8.9236 0.002505993

Scatter plot of data and normalized data

23
0.25

0.2

0.15

0.1

0.05

0
-4 -2 0 2 4 6 8 10

10

4 standard deviation
1.7685 OUR DATA
standard deviation
2 1.7685 NORMALIZATION

0
-4 -3 -2 -1 0 1 2 3 4

-2

-4

24
10

0
1

9
13

17

21

25

29

33

37

41

45

49

53

57

61
-2

-4

Conclusion, the graph shows the normal distribution of petal length as the shape of curve is
bell-shaped.

Bell curve for petal length


mean of standard
sepal deviatio
length n
5.6636 0.5852
standard OUR NORMALIZATI
deviation DATA ON
increment
-3 3.908 0.00757322
-2.9 3.96652 0.010171792
-2.8 4.02504 0.013526062
-2.7 4.08356 0.017807476
-2.6 4.14208 0.023210816
-2.5 4.2006 0.029952667
-2.4 4.25912 0.038268165
-2.3 4.31764 0.048405738
-2.2 4.37616 0.060619605
-2.1 4.43468 0.075159938
-2 4.4932 0.092260708
-1.9 4.55172 0.112125452
-1.8 4.61024 0.134911412
-1.7 4.66876 0.160712709
-1.6 4.72728 0.189543463
-1.5 4.7858 0.221321934
-1.4 4.84432 0.255856913
-1.3 4.90284 0.292837649
-1.2 4.96136 0.331828529
-1.1 5.01988 0.372269612

25
-1 5.0784 0.413483808
-0.9 5.13692 0.454691131
-0.8 5.19544 0.495029994
-0.7 5.25396 0.533584985
-0.6 5.31248 0.569420032
-0.5 5.371 0.601615391
-0.4 5.42952 0.62930646
-0.3 5.48804 0.651722173
-0.2 5.54656 0.668220598
-0.1 5.60508 0.678319459
1.52656E- 5.6636 0.68171955
15
0.1 5.72212 0.678319459
0.2 5.78064 0.668220598
0.3 5.83916 0.651722173
0.4 5.89768 0.62930646
0.5 5.9562 0.601615391
0.6 6.01472 0.569420032
0.7 6.07324 0.533584985
0.8 6.13176 0.495029994
0.9 6.19028 0.454691131
1 6.2488 0.413483808
1.1 6.30732 0.372269612
1.2 6.36584 0.331828529
1.3 6.42436 0.292837649
1.4 6.48288 0.255856913
1.5 6.5414 0.221321934
1.6 6.59992 0.189543463
1.7 6.65844 0.160712709
1.8 6.71696 0.134911412
1.9 6.77548 0.112125452
2 6.834 0.092260708
2.1 6.89252 0.075159938
2.2 6.95104 0.060619605
2.3 7.00956 0.048405738
2.4 7.06808 0.038268165
2.5 7.1266 0.029952667
2.6 7.18512 0.023210816
2.7 7.24364 0.017807476
2.8 7.30216 0.013526062
2.9 7.36068 0.010171792
3 7.4192 0.00757322

Scatter plot of data and normalized data

26
0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

5 standard deviation
0.5852 OUR DATA
4
standard deviation
3 0.5852
NORMALIZATION
2

0
-4 -3 -2 -1 0 1 2 3 4

Conclusion, the graph shows the normal distribution of sepal length as the shape of curve is
bell-shaped.

27
Reference
¯ http://www.intellspot.com/types-sampling-methods/
¯ https://www.bmj.com/about-bmj/resources-readers/publications/statistics-
square-one/11-correlation-and-regression
¯ https://sphweb.bumc.bu.edu/otlt/MPH-
Modules/BS/BS704_Multivariable/BS704_Multivariable5.html
¯ https://www.investopedia.com/terms/b/bell-curve.asp

28

You might also like