Professional Documents
Culture Documents
The Analysis ToolPak is an Excel add-in program that provides data analysis tools for financial, statistical and engineering data analysis
2. On the Data tab, click Data Analysis. Select Descriptive Statisctics, and click OK.
3. Input Data Range, select all raw data, indicate output range (the cell where you want the output to appear), tick Summary Statistics, then click OK.
4. Result:
MEASURES OF CENTRAL TENDENCY (POSITION) AND VARIABILITY (DISPERSION)
(Grouped Data) Box-and-Whisker-Plot Construction, Analyzing, and Interpreting
1. Determine the 5 Number Summary 2. Arrange the 5 Number Summary on a number line in the following fashion.
Test Scores in Statistics Min 20 20 bulk of the data
45 observations n 50 Class Intervals Max 53 53 25% 25% Q2 25% 25%
Upper
30 Lower Limit
Min 20 Limit x-bar (mean) 38.1 Q1 N/4 12.5 31.5 Min Q1 Q3 Max
43 Max 53 20 24 Q3 3N/4 37.5 43.5 Md
36 Range 33 25 29 Md or Q2 N/2 25 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54
49 classes k 7 30 34 s cv N 50
37 interval i 5 35 39 Md 38.5 972529360.9252739 Test Scores in StatisticArranged in Ascending Order Interquartile Range (IQR) and Outliers
25 40 44 45 20 1 IQR Q3-Q1 (shows the boundary of the bulk of the data)
43 45 49 30 23 2
40 50 54 43 25 3 Test for Outliers:
32 For finding the Q1 and Q3 class R = UBh – LBl 36 25 4 1. Find IQR 12
53 Q1 = n/4 12.5 35 49 26 5 2. 1.5(IQR) 18
35 Q3 = 3n/4 37.5 Ciass Intervals
Class BoundariesFrequency (f)Class Marks (Xm) rf (%) cf< cf> cpf< cpf> fXm ІXm-xІ fІXm-xІ ІXm-xІ² fІXm-xІ² ІXm-xІ4 fІXm-xІ4 37 27 6 3. Q1 - 1.5(IQR) 49.5
20 20 - 24 19.5 - 24.5 2 22 4 2 50 4 100 44 16.1 32.2 259.21 518.42 67189.82 134379.6 25 27 7 4. Q3 + 1.5(IQR) -6
46 25 - 29 24.5 - 29.5 6 27 12 8 48 16 96 162 11.1 66.6 123.21 739.26 15180.70 91084.22 SK -0.15051 43 28 8 5. Any value more than the value in Step 3 or less than the value in Step 4 is an OUTLIER. 51, 52, a
40 Q1 class 30 - 34 29.5 - 34.5 9 32 18 17 42 34 84 288 6.1 54.9 37.21 334.89 1384.584 12461.25 <0, skewed to the left 40 30 9
30 Md Class 35 - 39 34.5 - 39.5 10 37 20 27 33 54 66 370 1.1 11 1.21 12.1 1.4641 14.641 32 30 10 An outlier is a data value that is much smalleror much larger than the other va;ue in the data set.
50 Q3 class Mo Class 40 - 44 39.5 - 44.5 12 42 24 39 23 78 46 504 3.9 46.8 15.21 182.52 231.3441 2776.129 53 31 11
26 45 - 49 44.5 - 49.5 7 47 14 46 11 92 22 329 8.9 62.3 79.21 554.47 6274.224 43919.56 35 31 12 Possible Questions which could easily be answered but just looking at the boxbplot:
48 50 - 54 49.5 - 54.5 4 52 8 50 4 100 8 208 13.9 55.6 193.21 772.84 37330.10 149320.4 20 32 13 1. Can we see the individual data? No. We use Box Plots to analyze SPREAD of the data.
34 n 50 100 1905 3114.5 433955.8 46 32 14 Remember, a measure of spread is a single number that describes how spread out or clustered togethe
52 40 33 15 Measures, Range, IQR,
39 Formula 1 K 2.151009 30 34 16 Remember that Box Plots separate the data into 4 equal parts. Even though the parts may differ in len
25 Q1 32 <3, Platykurtic 50 34 17 2. If the length of a whisker or the box is short, what does this tellyou about the values if the data in the
44 Mo 39.3 k 26 35 18 3. If the length of a whisker or the box is long, what does this tell you about the values of the data in the
38 48 35 19 4. What is the difference between the range and the IQR? Range is the spread of the whole data sheet
23 Formula 2 34 35 20
38 52 36 21
28 Q3 44 Mo 41 39 36 22
44 25 37 23
33 44 38 24
49 Δ1 2 38 38 25 Q2
31 IR 12 Δ2 5 23 39 26
51 38 39 27
32 28 40 28
27 QD 5.9375 44 40 29
42 33 40 30
34 49 41 31
45 31 42 32
41 51 42 33
35 32 43 34
48 27 43 35
31 42 43 36
43 34 43 37
36 45 44 38
43 41 44 39
40 35 45 40
39 48 45 41
27 31 46 42
42 43 48 43
35 36 48 44
43 49 45
40 49 46
39 50 47
27 51 48
42 52 49
35 53 50
Outliers
STEM LEAVES
45 30 49 40 2 03556778 8
30 50 31 39 3 0011223435556678899 19
43 26 51 27 4 0001223333445568899 19
36 48 32 42 5 0123 4
49 34 27 35 50 n
37 52 42
25 39 34
43 25 45 STEM LEAVES
40 44 41 2
32 38 35 3
53 23 48 4
35 38 31 5
20 28 43
46 44 36
40 33 43
CONSTRUCTION OF FREQUENCY DISTRIBUTION TABLE & GRAPHICAL REPRESENTATIONS
1. Encode the observations (raw data)
Test Scores in Statistics of 50 Students
45
30
43
36
49
37
25
43
40
32
53
35
20
46
40
30
50
26
48
34
52
39
25
44
38
23
38
28
44
33
49
31
51
32
27
42
34
45
41
35
48
31
43
36
43
40
39
27
42
35
n = 50
2. Determine the lowest value (LV) and highest value (HV) in the data set. Compute the Range,
R = HV -LV
HV 53
LV 20
R 33
Alternative Method 2 Alternative Method 3
3. Determine the number of classes/class intervals/class limits/categories, k 2K (2 to the kth rule formula)
where: k is the number of classes greater than or equal to 5 and the result must be greater than the number of observations i = 1 + 3.322 log n log 50 1.69897
6.643978 say 7
n 50 i class interval/class width/class size If k is: 2k Find 2k which is greater than the number of observations
k 7.0711 7 Round off to the nearest integer. i = R/k 5.5 say 6 Round up ALWAYS 1 2
2 4
4. Calculate the class size (or class width), i 3 8
i = R/k 4 16
R 33 5 32
k 7 6 64 greater than the n = 50, hence k should be 6
i 4.7143 5 Round off i to the nearest value with precision the same as those the raw data 7 128
8 256
5.Construct the classes as follows. Each class is an interval of values defined by the lower and upper limits.
§ The lower limit (LL) of the lowest class is LV. The lower limits of the succeeding classes are obtained by simply adding i to the lower limit of the preceding class. Lower LimUpper Limit OR Lower LimUpper Limit If the lowest observation is not divisible by the class size, adjust the lower limit of the first class
§ The upper limit (UL) of the lowest class is computed as the lower limit of the next class minus one unit of measure. The upper limits of the succeeding classes are obtained by adding i to the UL of the preceding class. The lowest observation 20 25 18 23 20 is not dividible by i = 6 18 is divisible by 6 (adjust backward)
6. Compute for the class mark by adding the lower and upper limits of the class interval, then dividing the sum by 2. The class mark is the representative value of the corresponding interval. is the UL of the first class 26 31 24 29
7. Class boundaries are more precise expressions of the class limits by at least 0.5 of their values. They are called the true class limits. The class boundary is situated between the upper limit of one interval and the lower limit of the next interval. 32 37 30 35
38 43 36 41
CLASS INTERVALS CLASS BOUNDARIES FREQUENCIES CLASS MARKS 44 49 42 47
Lower Limit Upper Limit 50 55 48 53 The highest observation is the LL of the last class
20 24 19.5 - 24.5 2 22 I=6
25 29 24.5 - 29.5 6 27
30 34 29.5 - 34.5 9 32
35 39 34.5 - 39.5 10 37
40 44 39.5 - 44.5 12 42
45 49 44.5 - 49.5 7 47
50 54 49.5 - 54.5 4 52
55 Added 1 LL for the computation of the UL of the last class.
8. Determine the class frequencies for each class interval. On the Data tab, click Data Analysis. Select Histogram and click OK.
9. Select the Range of the raw data.
10. Click in the Bin Range box and select the range of the Upper Limits.
11. Click the Output Range box, and select the range (cells) you want the Bin - Frequency Table to appear. Bin Frequency
12. Check Chart Output, click OK. 24
29
2
6
Histogram
34 9
39 10
Frequency
44 12
49 7
54 4
More 0
24 29 34 39 44 49 54 More
Bin
13. Click the Legend on the right side of the Histogram and press Delete.
14. To remove the space in between bars, right click on any of the bars, select Format Data Series and change the Gap Width to 0.
15. Select change color of the histogram and/or add border, right click on the histogram, click Fill and/or Outline, choose desired color/s.
Histogram
16. If you want to change the design, select the histogram, Choose Design.
17. Result:
Frequency
Histogram
Frequency
24 29 34 39 44 49 54 More
Bin
24 29 34 39 44 49 54 More
Bin
18. To change the Horizontal (category) axis to Class Boundaries, right click on the category then click Select Data.
Histogram
Frequency
19.5 - 24.5 24.5 - 29.5 29.5 - 34.5 34.5 - 39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Bin
19. Click Edit in the Horizontal (Category) Axis Labels, 18. Input Axis Label Range, select Class Boundaries, click OK.
20. Input Axis Label Range, select Class Boundaries, click Edit in the Legend Entries (Series), change series values for Frequency, click OK.
21. Review Labels, untick unnecessary boundary/label. Click OK.
22. Select/click the Histogram, click (+) Chart Elements, tick Axes, Axis Title, and Chart Title, then properly label your Bin to Class Boundaries and Title of the chart/graph.
23. Select change color of the histogram and/or add border, right click on the histogram, click Fill and/or Outline, choose desired color/s.
24. Result:
19.5 - 24.5 24.5 - 29.5 29.5 - 34.5 34.5 - 39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Class Boundaries
10
Frequency Class Marks (Xm)
2 22
6 27 8
9 32
10 37 6
12 42
7 47 4
4 52
2
0
29. Click the Horizontal (Category) Axis of the Frequency Polygon, click Select Data, Edit Horizontal (Category) Labels, select all Classmarks. Click OK.
22 27 32 37 42 47 52
Class Marks
30. Click graph, click + (chart elements), Add axes titles and graph title
31. Select/click the Frequency Polygon, click (+) Chart Elements, tick Axes, Axis Title, and Chart Title, then properly label your axes and title of the chart/graph.
32. Select change color of the histogram and/or add border, right click on the frequency polygon, click Fill and/or Outline, choose desired color/s.
33. Result:
22 27 32 37 42 47 52
Class Marks
50
40
30
20
10
38. Click the Horizontal (Category) Axis, Select Data, Edit Horizontal (Category) Labels, select all Upper Class Boundaries. Click OK.
39. Click graph, click + (chart elements), Add axes titles and graph title
40. Select/click the Less Than Ogive, click (+) Chart Elements, tick Axes, Axis Title, and Chart Title, then properly label your axes and title of the chart/graph.
41 Select change color of the ogive and/or add border, right click on the Less Than Ogive, click Fill and/or Outline, choose desired color/s.
42. Result:
Class Boundaries
43. For the “Greater Than Ogive” type, this graph is constructed by plotting the greater than cumulative frequencies against the lower class boundaries.
Hence the “greater than” ogive could be useful in finding how many observations lie above a specified value within the distribution.
44. Copy the Lower Class Boundaries and correspondingGreater Than Cumulative Frequencies (cf>) in the FDT. Add another Upper Class Boundary after the highest Lowest Boundary with cf< = 0.
Lower Class Boundaries cf>
19.5 50
24.5 48
29.5 42
34.5 33
39.5 23
44.5 11
49.5 4
54.5 0
45. Follow the procedure done in constructing the Less than Ogive.
46. Result:
Class Boundaries
DETERMINATION OF THE SAMPLE SIZE
SLOVIN'S FORMULA
example:
N 5000 5000
e 0.05 0.10
n 370.3703798.03921
Adjusted n 370 98
Formula 2
Z 1.96 1.96 1.645 1.645
p 0.5 0.5 0.5 0.5 Cochran (1963)
e 0.05 0.1 0.05 0.1
no 384 96 271 68 where:
n = required minimum sample size
Z2 = the abcissa of the normal curve that cuts off the area α at the tails
(1-α) = desired confidence interval, e.g., 95%
e = desired level of precision
p = estimated proportion of an attribute that is present in the population (% of
q = 1-p
Note: Value of Z is found in the statistical tables which contain the area under