Stat 0302B Business Statistics Spring 2011-2012

Prologue

Statistics – What is it ?

Statistics deals with the collection and analysis of data to solve real-world problems. Time
Ancient

Contributor
Greece Philosophers Babylonians Egyptians European marine companies Blasé Pascal Pierre de Fermat

Contribution
Idea – no quantitative analyses. Collect demographical data for tax collection and recruitment of military units. Marine insurance rates were set using data concerning the success of the transportation of goods. Studied probability through games of chance and gambling. Proved the law of large numbers -- as the number of observations increased, “the ratio of observed successful to unsuccessful occurrences will differ from the true ratio within certain small limits.” Constructed the normal curve, developed the application of probability ideas to astronomy.

14th Century

17th Century Jacob Bernoulli

Pierre Simon Laplace Karl Friedrich Gauss 18th Century

Statistics underwent simultaneous horizontal and vertical development : • Horizontal : methods spread among disciplines including astronomy, geodesy, psychology, biology, social sciences, etc. • Vertical : understanding of mathematical probability theory led to the development of statistical inference. Adolphe Quetelet Astronomer who first applied statistical analyses to human biology. Studied genetic variation in humans, using regression and correlation. Studied natural selection using correlation, formed first academic department of statistics, Biometrika journal, helped develop the Chi Square analysis. Studied process of brewing, alerted the statistics community about problems with small sample sizes, developed Student's test. Evolutionary biologists who developed ANOVA, stressed the importance of experimental design. Provided many advantages over calculations by hand or by calculator, stimulated the growth of investigation into new techniques.

19th Century Sir Francis Galton

Karl Pearson

20th Century

William Sealy Gossett (Student) Sir Ronald Fisher

Computer Technology

P.1

Stat 0302B Business Statistics Spring 2011-2012

• The word “Statistics” is derived from the Latin word for “the state” as the first important accumulation of data was for the purposes of the state. • “Statistik” – probably first used by the German philosopher Gottfried Achenwall in middle of the eighteenth century. It referred to “inquiries respecting the Population, Political Circumstances, the Production of a Country, and other Matters of State”. • While the science of statistics was being studied in Germany, the words “Statistics” and “Statistical” were introduced into the English language around 1787 by Ebesherd A.W. von Zimmerman (1743-1815).

Elements of Statistics

⎧ ⎪ ⎧ Survey Sampling ⎪ ⎪ Data Collection ⎪Experimental Design ⎨ ⎪ ⎪Observational Study ⎪ ⎩ ⎪ Statistics ⎨ ⎪ ⎧ Descriptive Statistics and Statistical Graphics ⎪ ⎪ ⎪ ⎧ ⎧ Point ⎪ ⎪ Data Analysis ⎨ ⎪ Estimation ⎨ ⎩ Interval ⎪ ⎪ Statistical Inference ⎨ ⎪ Testing Hypothesis ⎪ ⎪ ⎩ ⎩ ⎩

Statistics Users Category 1 – able to understand statistical presentations Category 2 – able to select and apply statistical procedures to a particular problem Category 3 – applied statisticians who help others use statistics on a particular problem Category 4 – mathematical statisticians who develop new statistical techniques and discover new characteristics of old techniques

P.2

changing from item to item a set of numbers representing records of observations Variable 1 Variable 2 …… Variable m Data Unit 1 Unit 2 • • • Unit n Example 1.dat. 4. Change directory and select the data file student_record.) 5.1 Student records Dataset : student_record. This will specify how the data are separated in each row in the original data file. In the Text Import Wizard.Stat 0302B Business Statistics Spring 2011-2012 Chapter I Terminology Descriptive Statistics and Statistical Graphics Measurement process of assigning a number or numerical code to an object Instrument Variable device used to assist in the production of a number from the object a piece of information that may be expressed as a number.. Click on Next. You can save the worksheet in Excel Workbook (*. 5.. 2. You may need to first change the Files of type into All Files (*... Select both Tab and Space as the Delimiters. You can also save the worksheet in Excel 97-2003 Workbook (*. 3. From the Office Button menu.xlsx) format so that the data format for each variable can be stored too..dat 1.*) to show all files.xls) format so that the file will be compatible with older versions of Excel. Run Microsoft® Excel 07. select Save as .3 . (Click on Next can set the Data Format of each column. Office Button -> Open. P. select Delimited as the Original data type. The Data preview window shows the input data set. Click on Finish will load the data into the worksheet.

Quantitative Scales 3. grade. Ratio Scale • Tells us that one unit has so many times as much of the property as does another unit (it has a meaningful zero).g. • It has a meaningful zero. 2. • The classes are often called categories. strongly agree). • Zero is just a reference point. tutorial class.Stat 0302B Business Statistics Spring 2011-2012 § 1. Ordinal Scale • Also tells when one unit has more of the property than does another unit. weight. nationality. 4. • The categories have no logical order. Interval Scale • Also tells us that one unit differs by a certain amount of the property from another unit.g. e. P.g. sex. height. time of occurrence of an event. e.4 . agree. Zero really means nothing.g. Nominal Scale • Tells only what class a unit falls in with respect to the property. attitude (disagree. e. altitude of a place. e.1 Scales of Measurement Qualitative Scales 1. temperature in degrees Celsius.

A. 11. a boxplot.5 . 6. which represent the location and spread of the distributions. Such comparisons can also be done through the calculation of some statistics. X 10 . is then computed by P. or summary statistics. § 1.3 Measures of Central Tendency or Location Suppose we have a set of data. On the other hand. 9} which are the sales (in $m) of 10 furniture companies in a particular year.2 Distribution of Data The distribution of a quantitative variable provides a general picture for the user to have a rough idea on the number of units with the value of the variable falling in a certain range. X 2 . The interpretation and construction of histogram will be described in later sections. the scores in 03/04 looks more symmetric and spread out wider. There are several measures of central location. 11.Stat 0302B Business Statistics Spring 2011-2012 § 1. and the number of these items of data by n . 16.. 12. 13. A first glance from the above histograms would give the following impressions. The distribution of scores in 00/01-02/03 is located on the right hand side of the distribution of scores in 03/04. The above ten individual items of data (X) will be designated X 1 . 19.. {9. Average / Arithmetic Mean At this point it is necessary to introduce some symbols. The mean X pronounced ‘x-bar’... 10. It can be represented by a histogram.

i.9 + 96. the sorted dataset will be 6 9 9 10 11 11 12 13 16 19 if n is odd if n is even P. 13.9 = 69. 10. the mean overall scores of the 706 students in example 1. Median Arrange the data in ascending order. 11. For these ten sales of companies.5 + 96. {9. i. ⎧⎛ n + 1 ⎞ ⎪⎜ 2 ⎟ th number ⎪⎝ ⎠ median = ⎨ ⎪average of ⎛ n ⎞ th and ⎛ n + 1⎞ th numbers ⎜ ⎟ ⎜ ⎟ ⎪ ⎝2⎠ ⎝2 ⎠ ⎩ where n is the size of dataset.e. 11.6 . Similarly.1 is 96. 12. ∑ X = 9 + 16 + 11 + 19 + 11 + 10 + 13 + 12 + 6 + 9 = 116 ∑ X = 116 ÷ 10 = 11.6 million dollars. The median is the number with middle rank. ‘sigma’.8 + 22.6 + 26. 6.5 + L + 28. is the summation sign and is simply a command to add up all the 10 x-values. 19.Stat 0302B Business Statistics Spring 2011-2012 X= ∑X n where Σ. 16.6 706 The sum and mean can be computed by using the Excel functions: ∑X X @SUM(data range) @AVERAGE(data range) B.6 X= n Hence the mean sales of these ten companies is 11.e. 9}.

32. In general. the median is the best for describing highly skewed (not symmetric) distributions or when there are one or more outlying values whose validity is suspected. Otherwise.Stat 0302B Business Statistics Spring 2011-2012 and the median is the middle number. 21. However. Example 1. then the data will become : 13. the median overall scores of all the 706 students in the student record dataset is found to be 71. 23. 23.9. § 1. in general this is not always the case.2 Monthly income of five staffs (in $1000) : 13.8 Median = 23 If 35 is mistyped as 135.4 Measures of Spread (Variation) The above histograms show the scores (based on a questionnaire of ten five-pointscale items) at five-star hotels rated by 127 male guests and 114 female guests. The P. From this example we can also see that the mean is more sensitive to extreme values than the median. 21. 135 Mean = 44. 2 Using Excel function @MEDIAN(data range). the mean is the best as it fully utilizes all the data and easy to be studied. 35 Mean = 24.8 Median = 23 mg Hence it is possible to have very different values of mean and median. which is 11 + 11 = 11 million dollars.7 . Comparing this figure with the mean we can see that both measures of location give quite close results. 32.

However. 10. 16.6 -2.4 -0.96 0. 12.76 19. 6.4 0.6 Deviation squared (X − X )2 6.4 -0.4 -5.8222 9 ∑ (X − X ) = 124.36 2.8222 = 3. 11.16 31.6 -1.6 9. 11. which is given the symbol S.3 Sales of ten furniture companies: X = 11.36 0. 9 Sales ($m) X 9 16 11 19 11 10 13 12 6 9 Deviation from mean ( X = 11.8 .76 2 ∑ X = 116 ∑ (X − X ) n −1 2 ∑ (X − X ) = 0 = 124.7178 The standard deviation can be computed by using the Excel function: @STDEV(data range) P.6 1.6 4..4 S = 13. The formula for calculating the standard deviation is S= ∑ (X − X ) n −1 2 Example 1. 19. 13.6 7. There are several ways to assess variation.76 0.36 54. That means the rating variation by male is larger than that of females.Stat 0302B Business Statistics Spring 2011-2012 locations of both distributions are more or less the same.6 ) X−X -2.56 1. the scores given by male guests spread out wider than the scores given by female guests.36 6.4 = 13. but by far the most useful is the statistic known as the standard deviation.

Read the data into an Excel worksheet. median. 2. To compute the standard deviation for male scores only.Stat 0302B Business Statistics Spring 2011-2012 Example 1. The other statistics. Remarks 1. we can use the array formula: @STDEV(IF(A2:A242=”M”.4 Hotel Rating Data Dataset : hotel_scores. At any empty cells. @MEDIAN(B2:B242). and standard deviation of the scores given by all the guests. @STDEV(B2:B242) to obtain respectively the mean. The square of standard deviation. input @AVERAGE(B2:B242).B2:B242)) Press Ctrl-Shift-Enter instead of just Enter for an array formula. and the statistics for female scores can be computed similarly. The first column will contain the gender (M/F) of the guests and the second column will contain the scores given by the guests. P.9 . i. gender=male. S 2 . This will compute the standard deviation of the data in the range B2:B242 that satisfy the criteria A2:A242=”M”. n −1 2 2. 3.dat 1.e. is called the variance. A better formula for calculating the standard deviation would be S= ∑X 2 − (∑ X ) n .

4 = 13.7178 ∑ X = 116 ∑X = 1470 § 1.10 .8222 = 3.8222 10 − 1 S = 13.4 S2 = 124. Mean Overall Scores of Math244 students 90 80 70 60 Mean Overall 50 40 30 20 10 0 1 2 Year 3 * P.5 Sales ($m) X 9 16 11 19 11 10 13 12 6 9 X2 81 256 121 361 121 100 169 144 36 81 2 ∑X 2 (∑ X ) − n 2 = 124.Stat 0302B Business Statistics Spring 2011-2012 Example 1. Bar Chart • Display summarized data where there is no emphasis on the percentage of a total.5 Tables and Graphs for One-dimensional Data A.

11 . Dot Diagram • 80 90 • • • • • • • • 100 •• • 110 • • • • • • 120 130 Remarks 1. 12% D. 7% C. 72. 3% C+. For nominal or ordinal data. • • • Pie Charts A simple descriptive display of data that sum to a given total. 14% A. 8% B-. It is easy to construct. 3. 11% C-. 49. Most illustrative way of displaying percentages. Grade Distribution of Math244 Students B. 107. 8. 82.Stat 0302B Business Statistics Spring 2011-2012 B. 59. 20. Not suitable when we have too many points. P. 75. It is compact and can be used in the margins of other displays to add information. 2. 100. 27. 10% C. 4% A+. 1% F. 107. 15% B+. 15% A-.

Frequency Distribution e.g. Copy this cell to cells K3 to K12 to obtain the frequencies of other grades. Open the student record worksheet. input the function @COUNTIF(G$2:G$707. Table : Number of applicants by age Age below 20 20-22 22-24 24-26 26-28 28 or above Total No. Create a column of grade categories (column J in the following table).6 Student Record Data 1.J2) to obtain the frequency of A+. In cell K2. of Applicants 2 3 14 26 20 5 70 Cumulative Frequencies 2 5 19 45 65 70 ”20-22” means “greater than or equal to 20 but less than 22” Example 1. 4. Use simple arithmetic cumulative percentages. In cell K13.Stat 0302B Business Statistics Spring 2011-2012 D. operations to calculate the percentages and P.g. 2.12 . input @SUM(K2:12) to calculate the total frequency. 3. Table : Flow of vehicles passing through a particular point during an hour. Vehicles Cars Lorries Motorcycles Buses Total Frequency 45 22 6 3 76 Percentage 59 29 8 4 100 e.

For ordinal data. 4. To construct histogram using Excel. Click the Add-ins tab on the left panel. After the Analysis ToolPak is successfully loaded. one should: 1. Office Button -> Excel Options 2. If no natural order is formed. the Data Analysis command will be added to the Data menu. install it). 3. the most important point one should keep in mind is that counts of data are represented by area. arrange the categories in a meaningful way. Counts can be omitted provided that they can be recovered if desired. rather than height. The histogram itself is a frequency diagram because it shows the frequency-of-occurrence of results within particular intervals.. To construct a histogram. 2. For continuous data. Percentages and/or cumulative percentages should be included if it is what other people interest. 2. 4. then click Go.Stat 0302B Business Statistics Spring 2011-2012 Remarks 1. the Analysis ToolPak should be loaded first: 1. E.microsoft. categories should be arranged in natural order. Draw rectangles in the intervals. arrange the categories so that their associated frequencies are decreasing. The area of each rectangle should be proportional to the corresponding count. This command provides some handy functions for performing statistical analyses. 3.13 . information is lost through grouping of data.aspx P. When inspecting a histogram. select Excel Add-ins. Histogram Histogram is a very suitable form of representation of data distribution. Total size of the data should always be included. 5. For nominal data. Detailed instructions can be founded at: http://office.. Partition the range of data into several intervals (not necessarily of equal widths). especially for large datasets. (If it is prompted that the Analysis ToolPak is not installed. Select Analysis ToolPak and click on OK. At the Manage menu.com/en-us/excel-help/load-the-analysis-toolpak-HP001127724. Different choices of class-intervals may give different impressions.

. Overall Scores of Math244 Students (00/01-03/04) (N = 706) 250 200 Frequency 150 100 50 0 0 10 20 30 40 50 60 70 80 90 100 Overall P. 1.. Middleton. University of San Francisco. etc) as appropriate. However. Input 0 as the Start Value. it produces histogram as a column bar chart.. loaded. 5. 4.14 . then click OK. Michael R. 3.7 Frequency histogram with equal width classes 1. A handy add-in built on the basis of the original histogram command was developed by Prof. Download the add-in BetterHistogram_20070222_2050. 2. Click Browse. with numerical labels misaligned. Open the student record worksheet.xla from the course web. 4. Format the graph (color. After the Better Histogram add-in is successfully Histogram command will be added to the Add-Ins menu. change directory to select the downloaded add-in file.htm Example 1.. School of Business and Management. titles.. the Better Detailed instructions on the use of this command can be found in the book “ Data Analysis Using Microsoft Excel: Updated for Office 07 ”. Input F2:F707 as the Data Range for constructing the histogram. 2. fonts.treeplan. 100 as the Stop Value to define the intervals. or the following site: http://www.com/BetterHistogram_20041117_1555.Stat 0302B Business Statistics Spring 2011-2012 There is a histogram function in the added data analysis command. Add-Ins -> Better Histogram 3. Click OK to construct the graph. 10 as the Step Value. Office Button -> Excel Options -> Add-ins -> Excel Add-ins -> Go.

comparing to the average students.Stat 0302B Business Statistics Spring 2011-2012 From this histogram we can see the rectangular blocks from 50 to 90 comprises most of the area. Hence most of the students scored within 50 to 90. P. the distribution is said to be skewed to the left. one must use density as the y-axis scale whenever there are unequal class widths. Such y-scale is called the density. Since the area represent frequencies. However. It indicates that there are less low-score students than high-score students. Usually the y-scale will be further adjusted to make the total area equal to 1. i. 706. heights of the rectangular blocks are directly proportional to the areas. formally the scale of the y-axis should be the frequencies/score rather than just frequencies (height = area ÷ width). This kind of deviation from symmetry is called skewness. or negatively skewed. as illustrated by the following examples. density = frequency datasize relative frequency = width width Change of the y-axis scale make no difference on these two histograms because with same class width.e. Since the tail of the histogram points towards left hand side.15 . The area under the whole distribution would represent the total number of data. The graph is not symmetric as there is a tail pointing towards the left.

16 .9 Density histogram with unequal width class This histogram provide more detail description of the data distribution from 70 to 90. Example 1. the shape is still preserved. P.Stat 0302B Business Statistics Spring 2011-2012 Example 1. It will give the incorrect impression that there are much more students scored 35 to 70 than 70 to 90. However.8 Frequency histogram with unequal width classes (Incorrect construction of histogram) The shape of the distribution was totally ruined. So conservation of distribution perception is the main reason of using area to represent frequencies rather than using height.

17 . e. 68. If the population of all students (including those in future semesters) has more or less the same distribution as this dataset. Test scores of ten students : Sorted scores : 68. 71. we may infer that there would be around 20% chance to observe a student scoring in this range. 47. 58. Histogram is a more suitable form of representation than frequency distribution table when the class-intervals have unequal widths. No gap between blocks. P. Hence it is important to understand the interpretation of histogram as there will be some similarities between histogram and probability density curve. 79 34. all the values between 58 and 63 are also 30th percentiles.02 × (70 − 60) = 20% of the total. 2. (100p)th percentile (0 < p < 1) has approximately np observations less than or equal to it and also n(1-p) observations greater than it. It is invalid to use broken vertical scale in a histogram. 3. 63. 79. The process of statistical inference (with suitable assumptions) allows us to equate in a numerical fashion the relative frequency of past events and the probability of future. 58. 5. 83. say those lying between 60 and 70. Percentile In a dataset of n observations. 47. There is no good way to represent graphically the open-ended intervals when they have non-zero frequencies. 34. 63. 4. 71. 90 There are 3 students having scores less than or equal to 58. Therefore the datum 58 is a 30th percentile. 75. Note that under the above definition. made up about 0.Stat 0302B Business Statistics Spring 2011-2012 Relative frequency and Probability We saw from the above histogram that a given class of results. 90. 83. Remarks 1.g. F. 75. which will be discussed in Chapter III. This 20% is called the relative frequency of scores between 60 and 70. Size of dataset should be given.

3( X ( 4 ) − X (3 ) ) = 58 + 0.5 In a relative frequency histogram. Let r and f be the integer part and fractional part of (n + 1) p respectively.18 .g.Stat 0302B Business Statistics Spring 2011-2012 Let X (i ) denote the ith smallest value (such that X (1) ≤ X ( 2 ) ≤ L ≤ X ( n ) ). 30th percentile = X (3 ) + 0. (100 p )th percentile = L where Lp n Fp −1 fp d Fp −1 Lp a d fp p + d (np − Fp −1 ) fp = lower class boundary of the class of this percentile = number of observations = cumulative frequency below the class of this percentile = frequency of the class of this percentile = class width of the class of this percentile np (100p)th percentile Lp + d d (np − Fp −1 ) a d = ⇒a= np − Fp −1 f p fp d (np − Fp −1 ) (100 p )th percentile = L p + a = L p + fp P. The following definition provides a formula to compute percentile uniquely: (100 p )th percentile = X ( ) + f (X ( r r +1 ) − X (r ) ) . e. For grouped frequency distribution table. the 100pth percentile is the cutoff point on the xaxis such that the total area of the histogram on the left of this cutoff point is equal to 100p percent.3(63 − 58 ) = 59.

upper quartile respectively.g. it can give good picture about the centre and spread of the distribution. Fp −1 = 14 . It is simple and compact. p) G. Although it is less informative than the histogram.75.60” means “greater than or equal to 58 but less than 60” For p = 0. np = 15 . together with the maximum and minimum. Table : Frequency table for 20 grain bullet penetration depths into oak wood from a distance of 15 feet.19 .Stat 0302B Business Statistics Spring 2011-2012 e. Max = 72) 55 60 65 70 75 P. d = 2 .67 3 Excel percentile function: =PERCENTILE(data range.67.67. Median = 62. provide a five-number summary of the data. 75 th percentile = 64 + 2(15 − 14 ) = 64. L p = 64 .g. A boxplot is a graphical display of the five-number summary. f p = 3 . QL = 60. QU = 64. 50th. Boxplot The 25th. These three values. Boxplot of the bullet penetration data. (Min = 58. Penetration Depth (mm) 58 – 60 60 – 62 62 – 64 64 – 66 66 – 68 68 – 70 70 – 72 Total Frequency Cumulative Frequency 5 3 6 3 1 0 2 20 5 8 14 17 18 18 20 ”58 . median. e. 75th percentiles cuts the data into four pieces and are given the special names lower quartile.

i. Example 1. the scores of audit (*) and year 2 students spread out a little bit wider than year 1 and year 3 students. about 50% of the overall scores located at the centre. from 60 to 80 mm.10 Boxplot of Student Record Data (created by another statistical package) Lower Quartile Median Upper Quartile The box comprises the middle half of the dataset.e. The stars represents data points that is too extreme compared to the others and is sometimes regarded as outliers.11 Comparison of several boxplots We can easily compare the distributions of the data in several groups by just one graph. For example. P.20 . The location of the distribution of audit students is on the right of all the others. which suggesting that they performed better than other students in terms of examination results.Stat 0302B Business Statistics Spring 2011-2012 Example 1.

Data 22.1) (1) e. Stem-and-Leaf Display Each data value is split into two components called stem and leaf. 79. 86. 65.1 2. 62.21 n = 18 * t f s + Leaf unit = 1 for for for for for 0. However. Stretched stem-and-leaf display : 3+ 4* 4+ 5* 5+ 8 0014 556779 013 8 n = 15 Leaf unit = 1 * for 0-4 + for 5-9 Squeezed stem-and-leaf display : 1* t f s + 2* t 0 23 4445 67777 889 01 2 P.1) (0. 79.3 4. 51. It is not suitable for large datasets.9 . Data : 78.9 2|2 Stem 22 2 2 Leaf 9 29 2 Unit of Leaf (0. 90. There are variations of stem-and-leaf displays.9 22.9 22 trunctated or or Split 22|9 2|2.5 6.Stat 0302B Business Statistics Spring 2011-2012 H.7 8.g. 101 5* 6 7 8 9 10* Stem 1 25 899 46 0 1 Leaf n = 10 Leaf unit = 1 Stem-and-leaf plot is more informative than histogram as it displays the raw data. 84.

22 .70).Stat 0302B Business Statistics Spring 2011-2012 Back-to-back stem-and-leaf plot : Group A 84 531 988421 85200 976540 97655210 1 2 3 4 5 6 Group B 122355567 0111222346777899 012457 11257 0236 02 n = 72 Leaf unit = 1 Example 1.12 Stem-and-leaf display of student record data (created by other statistical package) Stem-and-Leaf Display: Overall Stem-and-leaf of Overall Leaf Unit = 1.0 Year = 1 N = 190 1 2 3 4 8 19 33 52 71 (25) 94 71 49 27 9 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 2 6 2 5 0034 55778888999 00122233344444 5556666777888899999 0011111223333334444 5555566667778888888999999 00011112233333344444444 5555566677788888899999 0000000001111112334444 555666666777788999 01112223 6 The numbers in the first column are the cumulative frequencies of each class interval. accumulating from two ends. P. Note that (25) is the frequency of the median class interval [65.

D 18% A 27% D 18% B 37% C 18% B 37% C 18% A 27% Bar graph : Monthly sales of the four brands.13 Four brands of cigarette : A.0 0.5 1. • “lying with statistics” Presenting data graphically on a stretched or compressed scale of numbers with the aim of making the data show whatever you want to show. C. Monthly sales ($m) Monthly sales ($m) 1. Examples 1.0 0 A B C D 0 A B C D P.5 1.23 . • Statistical tests tend to be more objective than human eyes and are less prone to deception as long as the corresponding assumptions hold.6 Cautions about graphs • Pictures can be deceptive even when there is no intention to deceive.1 1. D Pie chart : Market penetration of the four brands.Stat 0302B Business Statistics Spring 2011-2012 § 1. B.2 1.

Example 1. P.24 .Stat 0302B Business Statistics Spring 2011-2012 Example 1.14 The representations are not directly proportional to the quantities represented.15 Failure to show the relevant context produces a thoroughly misleading display.

Stat 0302B Business Statistics Spring 2011-2012 Example 1.16 Misleading alignment of graphs P.25 .

front and back to avoid the appearance that there's a lot less here than meets the eye.the bottom series) and the interesting use of curved lines.17 "This may well be the worst graphic ever to find its way into print. Note the clever use of mirror-imaging -.Tufte (1983) This graph uses colours. A simple bar chart displaying the same set of data is given below: P. disguised redundancy to display just five numbers." --.the top series is just (100 . 3D effects.26 .Stat 0302B Business Statistics Spring 2011-2012 Example 1.

7 Measures of Skewness Shape of histograms: L shape J shape Bell shape U shape Skewness indicate how far the shape the histogram is different from symmetric shape.27 .Stat 0302B Business Statistics Spring 2011-2012 § 1. Skewed to the right (Positively skewed) Skewed to the left (Negatively skewed) P.

Select Descriptive Statistics. Open the student record worksheet. 4. Data -> Data Analysis. 5. 6.28 . The summary statistics will be computed and listed in a new worksheet. 3.Stat 0302B Business Statistics Spring 2011-2012 Measures of skewness: 1 3 ∑ (X − X ) γ1 = n S3 γ1 > 0 γ1 < 0 γ1 = 0 skewed to the right skewed to the left not skewed (symmetric) Another measure of skewness: γ2 = mean − median S The skewness can be computed by using the Excel function: @SKEW(data range) The Descriptive Statistics command in the Data Analysis add-in provides an integrated calculation for all the summary statistics described in this chapter. This will specify the first row as the label of the variable.. Example 1.18 Descriptive Statistics of Student Record Data 1. then click OK.. Type F1:F707 in the Input Range field. 2. Click OK. P. Check the Summary Statistics box to request the calculation of summary statistics. Check the Labels in First Row box.

Sign up to vote on this title
UsefulNot useful