You are on page 1of 15

Analysis ToolPak in Excel

The Analysis ToolPak is an Excel add-in program that provides data analysis tools for financial, statistical and engineering data analysis

To load the Analysis ToolPak add-in, execute the following steps.


1. Open Excel.
2. Click on the green File tab. Click on Options.
3. Double click Add-ins. Under Add-ins, select Analysis ToolPak and click on the Go button.
4. Check Analysis ToolPak and click on OK.
5. On the Data tab, you can now click on Data Analysis.

6. The following Data Analysis box below will appear.


DESCRIPTIVE STATISTICS
(Ungrouped Data)
1. Encode the observations (raw data)
Test Scores in Statistics of 50 Students
45 Column1
30
43 Mean 37.94
36 Standard Error 1.165633751
49 Median 38.5
37 Mode 43
25 Standard Deviation 8.242275295
43 Sample Variance 67.93510204
40 Kurtosis -0.7387223656
32 Skewness -0.1468437706
53 Range 33
35 Minimum 20
20 Maximum 53
46 Sum 1897
40 Count 50
30
50
26
48
34
52
39
25
44
38
23
38
28
44
33
49
31
51
32
27
42
34
45
41
35
48
31
43
36
43
40
39
27
42
35
n = 50

2. On the Data tab, click Data Analysis. Select Descriptive Statisctics, and click OK.
3. Input Data Range, select all raw data, indicate output range (the cell where you want the output to appear), tick Summary Statistics, then click OK.
4. Result:
MEASURES OF CENTRAL TENDENCY (POSITION) AND VARIABILITY (DISPERSION)
(Grouped Data) Box-and-Whisker-Plot Construction, Analyzing, and Interpreting
1. Determine the 5 Number Summary 2. Arrange the 5 Number Summary on a number line in the following fashion.
Test Scores in Statistics Min 20 20 bulk of the data
45 observations n 50 Class Intervals Max 53 53 25% 25% Q2 25% 25%
Upper
30 Lower Limit
Min 20 Limit x-bar (mean) 38.1 Q1 N/4 12.5 31.5 Min Q1 Q3 Max
43 Max 53 20 24 Q3 3N/4 37.5 43.5 Md
36 Range 33 25 29 Md or Q2 N/2 25 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54
49 classes k 7 30 34 s cv N 50
37 interval i 5 35 39 Md 38.5 972529360.9252739 Test Scores in StatisticArranged in Ascending Order Interquartile Range (IQR) and Outliers
25 40 44 45 20 1 IQR Q3-Q1 (shows the boundary of the bulk of the data)
43 45 49 30 23 2
40 50 54 43 25 3 Test for Outliers:
32 For finding the Q1 and Q3 class R = UBh – LBl 36 25 4 1. Find IQR 12
53 Q1 = n/4 12.5 35 49 26 5 2. 1.5(IQR) 18
35 Q3 = 3n/4 37.5 Ciass Intervals
Class BoundariesFrequency (f)Class Marks (Xm) rf (%) cf< cf> cpf< cpf> fXm ІXm-xІ fІXm-xІ ІXm-xІ² fІXm-xІ² ІXm-xІ4 fІXm-xІ4 37 27 6 3. Q1 - 1.5(IQR) 49.5
20 20 - 24 19.5 - 24.5 2 22 4 2 50 4 100 44 16.1 32.2 259.21 518.42 67189.82 134379.6 25 27 7 4. Q3 + 1.5(IQR) -6
46 25 - 29 24.5 - 29.5 6 27 12 8 48 16 96 162 11.1 66.6 123.21 739.26 15180.70 91084.22 SK -0.15051 43 28 8 5. Any value more than the value in Step 3 or less than the value in Step 4 is an OUTLIER. 51, 52, a
40 Q1 class 30 - 34 29.5 - 34.5 9 32 18 17 42 34 84 288 6.1 54.9 37.21 334.89 1384.584 12461.25 <0, skewed to the left 40 30 9
30 Md Class 35 - 39 34.5 - 39.5 10 37 20 27 33 54 66 370 1.1 11 1.21 12.1 1.4641 14.641 32 30 10 An outlier is a data value that is much smalleror much larger than the other va;ue in the data set.
50 Q3 class Mo Class 40 - 44 39.5 - 44.5 12 42 24 39 23 78 46 504 3.9 46.8 15.21 182.52 231.3441 2776.129 53 31 11
26 45 - 49 44.5 - 49.5 7 47 14 46 11 92 22 329 8.9 62.3 79.21 554.47 6274.224 43919.56 35 31 12 Possible Questions which could easily be answered but just looking at the boxbplot:
48 50 - 54 49.5 - 54.5 4 52 8 50 4 100 8 208 13.9 55.6 193.21 772.84 37330.10 149320.4 20 32 13 1. Can we see the individual data? No. We use Box Plots to analyze SPREAD of the data.
34 n 50 100 1905 3114.5 433955.8 46 32 14 Remember, a measure of spread is a single number that describes how spread out or clustered togethe
52 40 33 15 Measures, Range, IQR,
39 Formula 1 K 2.151009 30 34 16 Remember that Box Plots separate the data into 4 equal parts. Even though the parts may differ in len
25 Q1 32 <3, Platykurtic 50 34 17 2. If the length of a whisker or the box is short, what does this tellyou about the values if the data in the
44 Mo 39.3 k 26 35 18 3. If the length of a whisker or the box is long, what does this tell you about the values of the data in the
38 48 35 19 4. What is the difference between the range and the IQR? Range is the spread of the whole data sheet
23 Formula 2 34 35 20
38 52 36 21
28 Q3 44 Mo 41 39 36 22
44 25 37 23
33 44 38 24
49 Δ1 2 38 38 25 Q2
31 IR 12 Δ2 5 23 39 26
51 38 39 27
32 28 40 28
27 QD 5.9375 44 40 29
42 33 40 30
34 49 41 31
45 31 42 32
41 51 42 33
35 32 43 34
48 27 43 35
31 42 43 36
43 34 43 37
36 45 44 38
43 41 44 39
40 35 45 40
39 48 45 41
27 31 46 42
42 43 48 43
35 36 48 44
43 49 45
40 49 46
39 50 47
27 51 48
42 52 49
35 53 50
Outliers

STEM AND LEAF PLOTS

STEM LEAVES
45 30 49 40 2 03556778 8
30 50 31 39 3 0011223435556678899 19
43 26 51 27 4 0001223333445568899 19
36 48 32 42 5 0123 4
49 34 27 35 50 n
37 52 42
25 39 34
43 25 45 STEM LEAVES
40 44 41 2
32 38 35 3
53 23 48 4
35 38 31 5
20 28 43
46 44 36
40 33 43
CONSTRUCTION OF FREQUENCY DISTRIBUTION TABLE & GRAPHICAL REPRESENTATIONS
1. Encode the observations (raw data)
Test Scores in Statistics of 50 Students
45
30
43
36
49
37
25
43
40
32
53
35
20
46
40
30
50
26
48
34
52
39
25
44
38
23
38
28
44
33
49
31
51
32
27
42
34
45
41
35
48
31
43
36
43
40
39
27
42
35
n = 50

2. Determine the lowest value (LV) and highest value (HV) in the data set. Compute the Range,

R = HV -LV
HV 53
LV 20
R 33
Alternative Method 2 Alternative Method 3
3. Determine the number of classes/class intervals/class limits/categories, k 2K (2 to the kth rule formula)
where: k is the number of classes greater than or equal to 5 and the result must be greater than the number of observations i = 1 + 3.322 log n log 50 1.69897
6.643978 say 7
n 50 i class interval/class width/class size If k is: 2k Find 2k which is greater than the number of observations
k 7.0711 7 Round off to the nearest integer. i = R/k 5.5 say 6 Round up ALWAYS 1 2
2 4
4. Calculate the class size (or class width), i 3 8
i = R/k 4 16
R 33 5 32
k 7 6 64 greater than the n = 50, hence k should be 6
i 4.7143 5 Round off i to the nearest value with precision the same as those the raw data 7 128
8 256
5.Construct the classes as follows. Each class is an interval of values defined by the lower and upper limits.
§ The lower limit (LL) of the lowest class is LV. The lower limits of the succeeding classes are obtained by simply adding i to the lower limit of the preceding class. Lower LimUpper Limit OR Lower LimUpper Limit If the lowest observation is not divisible by the class size, adjust the lower limit of the first class
§ The upper limit (UL) of the lowest class is computed as the lower limit of the next class minus one unit of measure. The upper limits of the succeeding classes are obtained by adding i to the UL of the preceding class. The lowest observation 20 25 18 23 20 is not dividible by i = 6 18 is divisible by 6 (adjust backward)
6. Compute for the class mark by adding the lower and upper limits of the class interval, then dividing the sum by 2. The class mark is the representative value of the corresponding interval. is the UL of the first class 26 31 24 29
7. Class boundaries are more precise expressions of the class limits by at least 0.5 of their values. They are called the true class limits. The class boundary is situated between the upper limit of one interval and the lower limit of the next interval. 32 37 30 35
38 43 36 41
CLASS INTERVALS CLASS BOUNDARIES FREQUENCIES CLASS MARKS 44 49 42 47
Lower Limit Upper Limit 50 55 48 53 The highest observation is the LL of the last class
20 24 19.5 - 24.5 2 22 I=6
25 29 24.5 - 29.5 6 27
30 34 29.5 - 34.5 9 32
35 39 34.5 - 39.5 10 37
40 44 39.5 - 44.5 12 42
45 49 44.5 - 49.5 7 47
50 54 49.5 - 54.5 4 52
55 Added 1 LL for the computation of the UL of the last class.
8. Determine the class frequencies for each class interval. On the Data tab, click Data Analysis. Select Histogram and click OK.
9. Select the Range of the raw data.
10. Click in the Bin Range box and select the range of the Upper Limits.
11. Click the Output Range box, and select the range (cells) you want the Bin - Frequency Table to appear. Bin Frequency
12. Check Chart Output, click OK. 24
29
2
6
Histogram
34 9
39 10

Frequency
44 12
49 7
54 4
More 0

24 29 34 39 44 49 54 More

Bin

13. Click the Legend on the right side of the Histogram and press Delete.
14. To remove the space in between bars, right click on any of the bars, select Format Data Series and change the Gap Width to 0.
15. Select change color of the histogram and/or add border, right click on the histogram, click Fill and/or Outline, choose desired color/s.
Histogram
16. If you want to change the design, select the histogram, Choose Design.
17. Result:

Frequency
Histogram
Frequency

24 29 34 39 44 49 54 More

Bin
24 29 34 39 44 49 54 More

Bin

18. To change the Horizontal (category) axis to Class Boundaries, right click on the category then click Select Data.

Histogram
Frequency

19.5 - 24.5 24.5 - 29.5 29.5 - 34.5 34.5 - 39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5

Bin

19. Click Edit in the Horizontal (Category) Axis Labels, 18. Input Axis Label Range, select Class Boundaries, click OK.
20. Input Axis Label Range, select Class Boundaries, click Edit in the Legend Entries (Series), change series values for Frequency, click OK.
21. Review Labels, untick unnecessary boundary/label. Click OK.
22. Select/click the Histogram, click (+) Chart Elements, tick Axes, Axis Title, and Chart Title, then properly label your Bin to Class Boundaries and Title of the chart/graph.
23. Select change color of the histogram and/or add border, right click on the histogram, click Fill and/or Outline, choose desired color/s.
24. Result:

Histogram of the Distribution of Scores in Statistics of 50 Studen…


Frequency

19.5 - 24.5 24.5 - 29.5 29.5 - 34.5 34.5 - 39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Class Boundaries

25. Complete the Frequency Distribution Table.


Ciass Intervals Class Boundaries Frequency Class Marks (Xm) rf (%) cf<
20 - 24 19.5 - 24.5 2 22 4 2
25 - 29 24.5 - 29.5 6 27 12 8
30 - 34 29.5 - 34.5 9 32 18 17
35 - 39 34.5 - 39.5 10 37 20 27
40 - 44 39.5 - 44.5 12 42 24 39
45 - 49 44.5 - 49.5 7 47 14 46
50 - 54 49.5 - 54.5 4 52 8 50
n 50 100

26. Construct the Frequency Polygon


27. Copy the Frequency and Class Marks columns in the FDT.
28. Select all Frequencies, click Insert Line or Area Graph, choose Line with Markers.
12

10
Frequency Class Marks (Xm)
2 22
6 27 8
9 32
10 37 6
12 42
7 47 4
4 52
2

0
29. Click the Horizontal (Category) Axis of the Frequency Polygon, click Select Data, Edit Horizontal (Category) Labels, select all Classmarks. Click OK.

Frequency Polygon of the Distribution of Scores in Statistics


of 50 Students
Frequency

22 27 32 37 42 47 52

Class Marks

30. Click graph, click + (chart elements), Add axes titles and graph title
31. Select/click the Frequency Polygon, click (+) Chart Elements, tick Axes, Axis Title, and Chart Title, then properly label your axes and title of the chart/graph.
32. Select change color of the histogram and/or add border, right click on the frequency polygon, click Fill and/or Outline, choose desired color/s.
33. Result:

Frequency Polygon of the Distribution of Scores in Statistics of 50


Students
Frequency

22 27 32 37 42 47 52

Class Marks

34. Graphical Representation of Cumulative Frequency Distributions: Ogive


35. For the “Less Than Ogive” type, this graph is constructed by plotting the less than cumulative frequencies against the upper class boundaries.
If properly constructed, the graphical device could aid in estimating the number of cases falling below any given value within the range of the distribution.
36. Copy the Upper Class Boundaries and corresponding Less Than Cumulative Frequencies (cf<) in the FDT. Add another Upper Class Boundary before the lowest class boundary with cf< = 0.
Upper Class Boundaries cf<
19.5 0
24.5 2
29.5 8
34.5 17
39.5 27
44.5 39
49.5 46
54.5 50
37. Select all cf< values , click Insert Line or Area Graph, choose Line with Markers.

50

40

30

20

10

38. Click the Horizontal (Category) Axis, Select Data, Edit Horizontal (Category) Labels, select all Upper Class Boundaries. Click OK.
39. Click graph, click + (chart elements), Add axes titles and graph title
40. Select/click the Less Than Ogive, click (+) Chart Elements, tick Axes, Axis Title, and Chart Title, then properly label your axes and title of the chart/graph.
41 Select change color of the ogive and/or add border, right click on the Less Than Ogive, click Fill and/or Outline, choose desired color/s.
42. Result:

“Less Than” Ogive of the Frequency Distribution of the


Scores in Statistics of 50 Students
Cumulative Frequency

19.5 24.5 29.5 34.5 39.5 44.5 49.5 54.5

Class Boundaries

43. For the “Greater Than Ogive” type, this graph is constructed by plotting the greater than cumulative frequencies against the lower class boundaries.
Hence the “greater than” ogive could be useful in finding how many observations lie above a specified value within the distribution.
44. Copy the Lower Class Boundaries and correspondingGreater Than Cumulative Frequencies (cf>) in the FDT. Add another Upper Class Boundary after the highest Lowest Boundary with cf< = 0.
Lower Class Boundaries cf>
19.5 50
24.5 48
29.5 42
34.5 33
39.5 23
44.5 11
49.5 4
54.5 0
45. Follow the procedure done in constructing the Less than Ogive.
46. Result:

“Greater Than” Ogive of the Frequency Distribution of the Scores


in Statistics of 50 Students
Cumulative Frequency

19.5 24.5 29.5 34.5 39.5 44.5 49.5 54.5

Class Boundaries
DETERMINATION OF THE SAMPLE SIZE
SLOVIN'S FORMULA

example:
N 5000 5000
e 0.05 0.10
n 370.3703798.03921
Adjusted n 370 98

POPULATION SIZE IS UNKNOWN


Formula 1
Z 1.96 1.96 1.645 1.645 Sampling
s 0.6 0.2 0.6 0.2 where:
e 0.05 0.05 0.05 0.05 n = required minimum sample size
n 369 246 260 173 Z2 = the abcissa of the normal curve that cuts off the area α at the tails
e = margin of error
s = standard deviation
Note: Value of Z is found in the statistical tables which contain the area under

Formula 2
Z 1.96 1.96 1.645 1.645
p 0.5 0.5 0.5 0.5 Cochran (1963)
e 0.05 0.1 0.05 0.1
no 384 96 271 68 where:
n = required minimum sample size
Z2 = the abcissa of the normal curve that cuts off the area α at the tails
(1-α) = desired confidence interval, e.g., 95%
e = desired level of precision
p = estimated proportion of an attribute that is present in the population (% of
q = 1-p
Note: Value of Z is found in the statistical tables which contain the area under

You might also like