This action might not be possible to undo. Are you sure you want to continue?

Contents: 2.1 Introduction

2.1.1 Types of Data

**2.2 Some Definitions 2.3 Frequency Distribution:
**

2.3.1 Graphical presentation of Frequency distribution

**2.4 Measure of Central tendency
**

2.4.1

Arithmetic Mean 2.4.2 Median 2.4.3 Mode

2.5 Measure of Dispersion

2.5.1 2.5.2 2.5.3 2.5.4 Range Mean Deviation Variance and standard deviation The Coefficient of Variation

1. The reason the data were collected is also important. technically. When data represent counts. or images. words.1. Good. They also stress the importance of exact definitions of these variables.4. Datum is the singular form of the noun data. Best}. and types of material {straw.4. Data can be classified as either numeric or nonnumeric. An example might be how many students were absent on a given day.3. A dictionary defines data as facts or figures from which conclusions may be drawn. sticks. Discrete data are numeric data that have a finite number of possible values. they are discrete. volume. At the physical level (microscopically). mass.. this is called descriptive statistics.1 Types of Data I. The real numbers are continuous with no gaps or interruptions. {1. Thus. 1. Some books use the terms individual and variable to reference the objects and characteristics described by a set of data.1 Introduction Statistics is a branch of applied mathematics concerned with the collection and interpretation of quantitative data and the use of probability theory to estimate population parametersStatistical methods can be used to summarize or describe a collection of data. are generally considered continuous.414.141421.5} perhaps corresponding to {Strongly Disagree… Strongly Agree}. Physically measureable quantities of length. II Quantitative data are numeric. Qualitative data are often termed categorical data. time. 2. Quantitative data are further classified as either discrete or continuous. ocounts are usually considered exact and integer.41. 1..4142.Chapter-II Data Analysis 2. Continuous data have infinite possibilities: 1. it is a collective or plural noun. bricks} are examples of qualitative data. particularly as measurements or observations of a set of variables. Better. Specific terms are used as follows: 2. Data are often viewed as a lowest level of abstraction from which information and knowledge are derived. Data may consist of numbers. Fair.2. colors (ignoring any physical causes). especially . 1. etc. Data: A collection of values to be used for statistical analysis. I Qualitative data are nonnumeric. 1. A classic example of discrete data is a finite subset of the counting numbers. including what units they are recorded in. {Poor.

modeling. but for normal life situations is a valid assumption. It is found by adding the upper and lower limits and dividing by two. Cumulative Relative Frequency (Relative Cumulative Frequency): The running total of the relative frequencies or the cumulative frequency divided by the total frequency. in different business. Class Mark (Midpoint): The number in the middle of the class. and social science domains. Relative Frequency: The frequency divided by the total frequency. science. The normal distribution is. Class Width: The difference between the upper and lower boundaries of any class. and transforming data with the goal of highlighting useful information. suggesting conclusions. It is not the difference between the upper and lower limits of the same class. Frequency: The number of times a certain value or class of values occurs. This gives the percent of values falling in that class. and supporting decision making. 2. perhaps.3 Frequency Distribution The distribution of empirical data is called a frequency distribution and consists of a count of the number of occurrences of each value. Class Boundaries: Separate one class in a grouped frequency distribution from another. then a grouped frequency distribution is used. Categorical Frequency Distribution: A frequency distribution in which the data is only nominal or ordinal. Typically. If the data are continuous.for mass. Class Limits: Separate one class in a grouped frequency distribution from another. The class width is also the difference between the lower limits of two consecutive classes or the upper limits of two consecutive classes. The limits could actually appear in the data and have gaps between the upper limit of one class and the lower limit of the next.2 Some Definitions Raw Data: Data collected in original form. Data analysis has multiple facets and approaches. There is no gap between the upper boundary of one class and the lower boundary of the next class. The boundaries have one more decimal place than the raw data and therefore do not appear in the data. a distribution is portrayed using a frequency polygon or a histogram. Many empirical distributions are . Mathematical distributions are often used to define distributions.5 units to the upper class limit. 2. the best known example. Ungrouped Frequency Distribution: A frequency distribution of numerical data. Frequency Distribution: The organization of raw data in table form with classes and frequencies. Grouped Frequency Distribution: A frequency distribution where several numbers are grouped into one class. The raw data is not grouped. Cumulative Frequency: The number of values less than the upper class boundary for the current class. gives the percent of the values which are less than the upper class boundary. this may not be true. encompassing diverse techniques under a variety of names. The lower class boundary is found by subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0. Data analysis is a process of gathering. This is a running total of the frequencies. It can also be found by adding the upper and lower boundaries and dividing by two.

1 Graphical presentation of Frequency distribution: Histogram A histogram is a graphical display of tabulated frequencies. Grouped Frequency Distribution A grouped frequency distribution is a frequency distribution in which frequencies are displayed for ranges of data rather than for individual values. Frequency polygons are also a good choice for displaying cumulative frequency distributions. but are especially helpful in comparing sets of data. For example. Example of a histogram of 100 values Advantages • • • Visually strong Can compare to normal curve Usually vertical axis is a frequency count of items falling into each category Disadvantages • • • Cannot read exact values because data is grouped into categories More difficult to compare two data sets Use only with continuous data Frequency Polygons Frequency polygons are a graphical device for understanding the shapes of distributions. . The frequency of individuals with various heights rounded off to the nearest inch would be then be tabulated. 2.approximated well by mathematical distributions such as the normal distribution. A histogram is the graphical version of a table that shows what proportion of cases fall into each of several or many specified categories. They serve the same purpose as histograms.3. the distribution of heights might be calculated by defining one-inch ranges.

. Place a point in the middle of each class interval at the height corresponding to its frequency. and label it with the middle value represented by the class. You should include one class interval below the lowest value in your data and one above the highest value. Then draw an X-axis representing the values of the scores in your data. start just as for histograms. The graph will then touch the X-axis on both sides. Finally. connect the points. Draw the Y-axis to indicate the frequency of each class.To create a frequency polygon. Advantages • • • Visually appealing Can compare to normal curve Can compare two data sets Disadvantages • • Anchors at both ends may imply zero as data points Use only with continuous data Frequency Curve A smooth curve which corresponds to the limiting case of a histogram computed for a frequency distribution of a continuous distribution as the number of data points becomes very large. by choosing a class interval. Mark the middle of each class interval with a tick mark.

1 Arithmetic Mean The arithmetic mean is the most common measure of central tendency. median and mode. where the observations are x1.…. For a given set of data. the mean is the sum of the observations divided by the number of observations. The center of a distribution could be defined three ways: 1. the value whose average absolute deviation from all the other values is minimized. if one wants to combine average values from samples of the same population with different sample sizes: Example 1: Observations Weights 12 2 15 5 20 7 22 6 30 1 . and the mean is the value that minimizes the sum of the squared values. Basically. 2. the Arithmetic Mean is defined as : The weighted arithmetic mean is used.xi . and 3. the median is the value that minimizes the sum of absolute deviations. For a data set. From the simulation in this chapter.4 Measure of Central tendency Central Tendency is the center or middle of a distribution. the point on which a distribution would balance. the mean describes the central location of the data. The most common are the mean.. There are many measures of central tendency. you discovered (we hope) that the mean is the point on which a distribution would balance. 2. x2.4.Advantages • Visually appealing Disadvantages • • Anchors at both ends may imply zero as data points Use only with continuous data 2. the value whose squared difference from all the other values is minimized.

92. Here are the sample test scores you have seen so often: 100.4. the median is not unique. 99. and therefore can be manipulated algebraically is the most sufficient of the three estimators is the most efficient of the three estimators is unbiased Weights 2 5 7 6 1 21 xiwi 24 75 140 132 30 404 Mean =401/21 =19. so one often takes the mean of the two middle values. Observations 12 15 20 22 30 Total Advantages • • • • can be specified using and equation. 100. 67. 80. For Even number of observations: Median = Average of (n/2) th and (n/2 + 1) th observations. If there is an even number of observations.Find the mean.. 45 . 85. 66. 90. low resistance) value is unlikely to be one of the actual data points requires an interval scale anything else about the distribution that we’d want to convey to someone if we were describing it to them? 2. 87.2 Median The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. 79.10 Disadvantages • • • • is very sensitive to extreme scores (i. 72. For Odd number of observations: Median = (n+1)/2 th observations. 76. 98. 91. 87.e. 88. 85. 91. 85.

4. The eleventh score in the ordered set is the median score (87). 5. 5. This score is known as the median. 87 is in the middle of this set of scores. the mode is the middle of that interval (650). the frequency of each value is one since no two scores will be exactly the same. 4. high resistance) doesn’t require the use of an interval scale. Since the interval with the highest frequency is 600-700. Range 500-600 600-700 700-800 800-900 900-1000 Frequency 3 6 5 5 0 . The grouped frequency distribution table shows a grouped frequency distribution for the target response time data. Therefore the mode of continuous data is normally computed from a grouped frequency distribution. as long as you can order the scores along some continuum then you can find the median Disadvantage • • can not be specified using an equation so can’t be manipulated algebraically is the least sufficient of the three estimators • is less efficient than the mean 2. the median would fall halfway between the tenth and eleventh scores in the ordered set. Why? Exactly half of the scores lie above 87 and half lie below it.3 Mode The mode is the most frequently occurring value. Thus. say 20. because ten scores are on either side of it..4. It is the most common value in a distribution: The mode of 3. Note that the mode may be very different from the mean and the median. With continuous data such as response time measured to many decimals. there are 21 scores. In this example. 5. We would find it by adding the two scores (the tenth and eleventh scores) together and dividing by two.The "middle" score of this group could easily be seen as 87.e. Advantages • • • is unbiased is unaffected by extreme scores (i. 8 is 5. If there were an even number of scores.

The former is designated as absolute measures if dispersion and expressed in the denomination of original variants while the latter is designated as related measures of dispersion. In measuring dispersion. e. Standard deviation and Variance (which is closely related to standard deviation) 4. The following are the important methods of studying variation: 1.e. Absolute measures can be divided into positional measures based on some items of the series such as (I) Range. high resistance) • • is unbiased doesn’t require an interval scale Disadvantages • • • • the mode depends on how we group the data can not be specified using an equation so can’t be manipulated algebraically is less sufficient than the mean is less efficient than the mean 2. The relative measures in each of the above cases are called the coefficients of the respective measures. it is necessary to know the amount of variation and the degree of variation. For purposes of comparison between two or more series with varying size or number of items. (ii) Quartile deviation or semi – interquartile range and those which are based on all items in series such as (I) Mean deviation. Mean deviation 3.5 Measure of Dispersion Measures of Dispersion provide us with a summary of how much the points in our data set vary. The Coefficient of Variation . Range 2.g. (ii) Standard deviation. how spread out they are or how volatile they are.Range Frequency 500-600 3 1000-1100 1 Table 3: Grouped frequency distribution Advantages • • represents a number that actually occurred in the data represents the largest number of scores.. and so the probability of getting that score is greater then the probability of getting any of the other scores if an observation is just chosen at random is unaffected by extreme scores (i. varying central values or units of calculation. only relatives measures can be used.

Mean Deviation can be of the following types: • Mean Deviation about Mean • Mean Deviation about Median • Mean Deviation about Mode Mean Deviation about Mean = . 3. Median.3 = 8 2. Accordingly. L = 3 range = H .It is also the crudest and most prone to error . 14} Range = 14-2=12 Coefficient of range = 14 – 2 12 ———— = —— = 0.5. 2.2. Mean. Mode.L = 11 .L Absolute range Relative range.75 14 + 2 16 H-L ——— H+L Example : You are given the following data: 3 6 9 11 Compute the sample range Solution : H = 11.2 Mean Deviation Mean Deviation can be calculated from any value of Central Tendency. viz.5. 4. Coefficient of range = ——————————— = Sum of the two extremes For example. for the data set {2.1 Range Range is the simplest of the summary measures of variation .It is computed as the difference between the largest and the smallest value in a data set: Range = H.

D = 2. A smaller value implies a smaller variation from the mean The positive square root of Variance is called the Standard Deviation.5. for the data set {2. Let us consider an example: Values 4 6 5 5 Total =20 .6 2. mean=5 Variance = ¼ .3 Variance and standard deviation Variance and standard deviation are the most common of all of the measures of variation Variance is a measure of statistical dispersion. 4. 14}: Measure of central tendency Absolute deviation | 2 . variance indicates the variability of the values. The mean absolute deviation is the average absolute deviation from the mean and is a common measure of Forecast Error or Time Series Analysis. Thus. 2.5. 3.XMean]2 1 1 0 0 2 S. indicating how its possible values are spread around the mean.5| +| 3 .5| + | 4 .5| = 3. The mean deviation of any data set from its mean is always zero.5| + Mean = 5 5 | 14 .Properties of Mean Deviation about Mean:• • • The average absolute deviation from the mean is less than or equal to the Standard Deviation.Mean(x) -1 1 0 0 [Xi .4 The Coefficient of Variation .5| + | 2 .2 =1/2 Xi . For example.

The Coefficient of Variance is a measure of variation expressed as a percentage the sample mean: CV = S Xmean . 100 .

Sign up to vote on this title

UsefulNot useful- CCP303
- SQC
- Analysis
- rr311801-probability-and-statistics
- r05220101-probability-and-statistics
- Biology Textbook
- 151 Practice Final 1
- Stat Lecs
- 001 E-book - Statistical Concepts and Their Applications in Business
- Mean, Median, Standard
- QA
- Confidence Intervals for Paired Means.pdf
- Applied Probablities 2
- 6j Sigma Lean Disciplines
- 6Feb06 Lab Notes
- AJC JC 2 H2 Maths 2011 Mid Year Exam Question Paper 2
- IESR050814FR
- Introductory Statistics Notes
- Formula Stat
- 2013 OHSUG - Oracle Clinical and RDC Training for Data Management and Clinical Teams
- E214-S08 Gold Award
- BCO105 S261503 Assessment Item 3.docx
- ~NeedToKnows15A
- Med Stat
- 57251_2000-2004
- Best Data Management Practices
- Measures of central Tendency
- Well Integrity Management
- Paper Grupo 11
- Mathematics_Notes 2016 HSC
- Data Analysis, Measures of Central Tendency & Dispersion

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.