You are on page 1of 13

Data

The number or observations usually obtained by some process of counting or
measurement. These are referred to collectively as data (raw material of statistics).
For example, heights and weights of the students of a class. Take another example,
we can collect the number of telephones that several workers install on a given day
or that one worker installs per day over a period of several days and we can call the
results our data. A collection of data is called a data set and single observation a
data point.
Reasons for obtaining data
1. Data are needed to provide the necessary input to a survey.
2. Data are needed to provide the necessary input to a study.
3. Data are needed to measure performance of an ongoing service or production
process.
4. Data are needed to evaluate conformance to standards.
5. Data are needed to assist in formulating alternative courses of action in a
decision making process.
6. Data are needed to satisfy our curiosity.
Cross-sectional and time series data. Cross-sectional data are data collected data
the same or approximately the same point in time. Time series data are data
collected over several time periods.
Variable
If man be an element of a population which possesses certain characteristic – such
as height, weight, age, hair, color etc. Each of these characteristics varies from man
to man either in magnitude or in quality and is, therefore, called a variable.
There are three basic ways of classifying a data set: (i) by the number of variables
(univariate, bivariate or multivariate), (ii) by the kind of information (numbers or
categories) represented by each variable and (iii) by whether the data set is time
sequence or comprises cross-sectional data (cross sectional is just a fancy way of
saying that no time sequence is involved, i.e., the first quarter 1996 earnings of eight
acre spaces firms).
Univeriate (one-variable) data sets have just one piece of information recorded for
each item. Only heights of the students of a class.
Bivariate (two-variable) data sets have exactly two pieces of information recorded
for each item. Heights and weights of the students of a class.
For bivariate data, in addition to looking at each variable as a univariate data set, you
can study the relationship between the two variables and predict one variable form
the other.

Also with multivariate data. weights and length of forearms of the students of a class. 1999. as well as examine the relationship among the variables and predict one variable from the other. For example. Quantitative variable can be measured while qualitative variable can categorized. An another example of types of data Data Type Categorica l Numerical Discrete Continuous Question Types Do you currently own U. As for example. Levels of Measurement and types of Measurement scales (Source: Berenson and Levine. Some categories of the hair color are block hair. seventh edition. it is called a discrete variable. S. page of Data 27) Statistical data may be broadly classified as categorical and numerical. Heights. the yield of a crop. if the number of children in a family is the variable of interest. Continuous variable A continuous variable is that which takes any value within some range. Government Savings Bonds? To how many magazines do you currently subscribe? How tall are you Responses ‫ ٱ‬Yes ‫ ٱ‬No 3 Number 67¼ inches Source: Berenson and Levine. Discrete variable When a variable can assume only isolated values. Categorical data are of two types: nominal and ordinal. Types of Data Variables may be either quantitative or qualitative. . Concept and applications. Basic Business Statistics. Height of a man is a continuous variable since it can any value which may be either an integral number of inches or fraction of an inch. golden hair and white hair. the price of a commodity is quantitative variables while hair color is a qualitative variable. The characteristics used to classify an individual into different categories is called attribute. you can look at each variable individually.Multivariate (many variable) data sets have three or more pieces of information recorded for each item. while numerical data are measured in interval scale and ratio scale. it is obvious that it cannot assume fractional values and it is a discrete variable. page: Types of Data 25. height of a man. The quantitative or measurable variable may of two types-discrete and continuous.

The distinct categories of the qualitative variables are sometimes called attributes. calendar time (3 p. Fairly Unsatisfied. it is quite meaningful to say that a 4-foot-tall boy is twice as tall as a 2-foot-tall boy. Fat consumed (in gm).Q test score. is attributable to the category smoker. Example of nominal scaling Categorical Variable Automobile ownership Political party affiliation Categories Yes No Democrat. Lecturer . Example of Ordinal scaling Categorical Variable Product satisfaction Faculty rank Example. Attributes. etc. Distance (in km). Neutral. Independent Other Example of Ordinal Data Job classification such as president. When there is an ordered relationship among the categories. All qualitative measurements are nominal.m. average and poor. Republican. Fairly Satisfied. white.Nominal data. Ratio Scale: Ratio data have all the ordering and distance properties of interval data. regardless of whether the categories are designated by names (red. vice-president. Urban-rural. Interval Scale: Data generated through the measurement of an interval variable are called interval data. male) or numerals (June 20. etc. Associate Professor. A thermometer. Religion.) etc. a ‘zero point’ can be meaningfully designated. measures temperatures in degrees. for example. recorded for each of a group of executives.m. In addition. Very Satisfied Highest-Lowest: Professor. to 6 p. when reported to be smoking. Ordered Categories Lowest-Highest: Very Unsatisfied. The difference between 20 0C and 210C is the same as the difference between 120C and 130C. For example. Weight. primary and secondary. Education level: illiterate. Ordinal data. His smoking behavior is used to classily him as smoker and thus it is an attribute. we can classify level of knowledge as good. Height. For example. which are the same size at any point of on the scale. I. departmental head and associate department head. Room 10). Assistant Professor. A worker. political affiliation. the variable is said to be an ordinal variable.

Classification Classification is the process of arranging individuals in groups or classes according to their affinities.. Chronological. Hebrew or Islamic) Height (in inches or centimeters) Weights (in pounds or kilograms) Age (in years or days) Salary (in American dollars or Japanese yen) Level of Measurement Interval Interval Ratio Ratio Ratio Ratio Sources of data Primary data. i.g.Numerical Variable Temperate (in degrees Celsius or Fahrenheit) Calendar Time (Gregorian. Broadly. Qualitative. 2. namely (i) Census (ii) Sample survey (iii) Focus group discussion. 3. e.e. Types of Classification.e. Secondary data. When the investigator collects first hand data for the purpose at hand. .. Quantitative. districts. according to some attributes. the data can be classified on the following four bases: 1. Geographical. area-wise. (iv) Telephone interview.. There are five important technique of data collection. industrial or individual sources such data will constitute secondary data. Technique of data collection. i. such data are known as primary data... i. in terms of magnitudes.e. When the investigator obtained the data from published or unpublished government. 4. on the basis of time. cities.e. etc. (v) Data collection through electronics media (vi) We may design an experiment to obtain the necessary data. i.

Chronological classification. colour.095. Qualitative classification.690. national income is expressed every year. Geographical Classification.414. etc. This type of statistical data is classified according to the time of its occurrence. regions. months. religion.065. etc.5 6. blindness. marital status. etc.7 1. zones. days. such as years. 2. areas. When the data are classified according to some quality or attributes.. cities.1.. etc. deafness. literacy.4 28. like States. Statistical data regarding population.9 17. exports. the classification is termed as qualitative or . For instance.3 Geographical classifications are usually listed in alphabetical order for easy reference.5 9. such as sex.38. intelligence. imports.074. In this type of classification data are classified on the basis of geographical or location differences between the various items. Time series are also called chronological classification. the production of foodgrains in India may be presented Stat-wise in the following manner: State-wise Estimates of Production of Foodgrains: 1987-88 Name of State Total Foodgrains (Thousand tonnes) Andhra Pradesh Bihar Haryana Punjab Uttar Pradesh All India 9. sales in a firm. Chronological Classification is illustrated below : Population of India from 1921 to 1981 Year Population (in inillion) 1921 1931 1941 1951 1961 1971 1981 248 276 313 357 438 536 684 3. weeks. honesty. They are further classified into the period of time and at the point of time..301. census data are expressed in decades. and departmental sales are expressed every month or week. also come under this classification. hours. For example.

we may first divide the population into males and females on the attribute of sex. of Students 90 – 100 100 – 110 110 – 120 120 – 130 130 – 140 140 – 150 Total 50 200 260 360 90 40 1. for example.descriptive attributes. the classification is termed as simple classification. This again can be classified into two types: (a) Simple classification. This classification is normally dichotomy or twofold. Quantitative Classification. such as literate and illiterate or honest and dishonest or skilled and unskilled. income. If the data are classified into only two classes. (b) Manifold classification. in the given units. production. then further divide them on the basis of literacy and so on: Population Male Literate Married Female Illiterate Literate Unmarried Married Unmarried Married Illiterate Unmarried Married Unmarried 4. the universe is classified on the basis of more than one attribute at a time. such as height. (a) Simple classification. profits. In manifold classification.000 . In this type we can only find out the presence or absence of the attributes. For example. Population Male Population Female Literate Illiterate (b) Manifold classification. for example. Quantitative classification refers to the classification of data according to some characteristics that can be measured. weight. sales. the students of a college may be classified according to weight as follows: Weight in (lbs) No. etc.

Titled of the table. 3.Such a distribution is known as empirical frequency distribution or simple frequency distribution. of Children No. Table number. Columns are vertical arrangement and rows are horizontal arrangement. It must describe the contents of the table. of Persons 10 15 40 45 20 4 Total 134 Tabulation of Data Tabulation. Objects Tabulation helps in understanding complex numerical data and makes them in a simple and clear way that their similar and dissimilar facts are separated. Series. a systematic presentation of numerical data in columns and rows in accordance with some salient features or characteristics. It must be written on the top of table. The following parts must be present in all tables : 1.130 130 – 140 140 – 150 150 – 160 No. By tabulation we mean. 7. 8. which can be described by a continuous variable. 4. 2. Each column should also be numbered as shown in the illustration. 6. Series represented by a discrete variable are called discrete series. It must explain (1) . 2. 5. A table should always be numbered for identification and reference in the future. The following are two examples of discrete and continuous frequency distributions: Examples of discrete and continuous frequency distributions: No.) 100 – 110 110 – 120 120 . of Children 0 1 2 3 4 5 6 10 40 80 100 250 150 50 Total 680 Weight (lbs. 1. Table number Title head note Caption Stubs Body of the table Foot-note Source-note. Parts of Tabulation A good statistical table is an art. Each table should be given a suitable title. are called continuous series.

Source: Pocket Statistical year Book.e. Sub heading: Male and female. It is the most important part of the table. Source-note. District wise distribution of Infant Mortality in Bangladesh in 2007. 5. 19 – 24 etc.3. It is a statement.years. the unit of measurement is written as a head-not. These are headings for the vertical columns. Head not. such as ‘in millions’ or ‘in crores’. Stubs are wider than columns. Truncated class interval. Stubs. Example. foot notes should be given. Foot-note. It contains the numerical information. They have main heading and sub-headings and must be written in small letters. 2007. for example. Ages: 14 – 19. They must be brief and self-explanatory. If any explanation or elaboration regarding any item in necessary. Body of the table. 4.. Captions. 6. what the data are (2) where the data are (3) time or period of data (4) how the data are classified. Main heading: population. etc. It is useful to the reader to cheek the figures and gather additional information. 7. every class interval is 5. These are the heading or designation for the horizontal rows. given below the title and enclosed in brackets. i. 8.years but last one is 3. . The arrangement in the body is generally from left to right in rows and from top to bottom in columns. It refers to the source from where information has been taken.

4 16. page No.8 15.STRUCTURE OF A TABLE Number Title (Head-note if any) Stub Heading.0 16.8 15.9 16.4 16. It arranged values in ascending or descending order.4 15.6 15.8 15. Heading Heading (2) (1) Total Col.8 16.3 16.4 16.0 16. It is “raw” because it is unprocessed by statistical methods. 1997.8 16.3 16.7 15.3 16.6 15.7 16.8 16.8 16.2 16.6 15.6 15.9 15.2 15.0 16.9 15.0 16.0 16. Data Array Data Array is one of the simplest ways to present data.9 15. Seventh edition.8 16.7 15.3 Source: Levin and Rubin. 8.8 15.2 15.2 15. Caption Col.2 15. The above Carpet data rearranges in a data array in ascending order as follows: 15.4 Yards produced yesterday by each of 30 Carpet Looms 15. Statistics for Management.6 16.9 16.0 16.2 16.9 16.6 15.6 15.0 16. (4) (3) Stub entries Body Total Foot-note: Source: Raw Data Information or observation before it is arranged and analyzed is called raw data.0 16.7 16.6 15.4 15.1 16.8 15.9 16.9 15.9 .1 15.3 16.3 16. Example of raw data 16.9 15.

Define the stem and leaf you wish to use you will probably wish to choose the stem so that the number of possible stems in the display is not too large. to find the observation in the middle of this ordered arrangement. 4. Advantages 1. or items. If we want to recover the original data from a stem and leaf display. Disadvantages of data arrays In spite of these advantages. and trailing digits or leaves. 3. We can easily divide the data into sections. One disadvantage of stem and leaf display is that it is awkward to control the number of stems. we can readily reconstruct the value of the observations by recombining the leaves with the stems. Next we can use stem leaf display for data management. Advantages of data arrays 1. 3. This makes if easy to arrange the observations from smallest to largest and for example. A stem and leaf display separates data entries into leading digits. it is cumbersome form for displaying large quantities of data. A more obvious disadvantage is that the stem and leaf display is unsuitable when the number of observations in the data set is large. sometimes a data array is not helpful. We can whether any values appear more that once in the array We can observe the distance between succeeding values in the data. 3. Because it lists every observation. 2. . How the construct a stem and leaf display 1. Write the stems in a column form the smallest stem at the top to the largest at the bottom. We need to compress the information and still be able to use it for information and decision making. It gives us a good graphical picture of the small size of data set. the ordered sequence obtained is called an ordered array. Record the leaf for each observation in the row corresponding to its stem. The construction of the stem and leaf display automatically arranges the observations in ordered sets. The Stem and Leaf Display The stem and leaf display is a valuable and versatile tool for organizing a set of data and understanding how the values distribution and cluster over the range of the observations in the set of data.Ordered Array If we place the raw data in order. We can quickly notice the lowest and highest values in the data. 2. 2. form the smallest to the largest observation.

first we define the stem and leaf.18 . For the stem.Because the number of leaves in the stem rows becomes too large.09 .2 . . Example 1 Construct a stem and leaf display for the following measurements: .0. thus giving stems . .02 .14 .3 and .42 .01 .12 .4.0 .27 .32 . We will then write the stems in a column and record the leaf for each observation.1 . Auto Parts Firm Leav Siegler Purolator Easco Genuine Parts Federal Mogul PPG Industries AO Smith Borg-Warner Hoover Universal Libbey-Owong-Ford Dana Champion Spark Plug Dayco Sheller-Globe Arvin Industries P/E ratio 11 15 14 15 12 12 35 12 12 23 23 18 39 15 16 Electronics Firm AMP Raytheon General Instrument Intel Avnet Perkin Elmer TRW Motorola Hcwlett-Packard Honeywell American District Corning Glass Works Gould EG4G Varian Associates P/E ratio 28 13 14 55 27 24 15 26 22 12 11 15 18 22 26 7 . use the first digit after the decimal point. Disadvantage.08 .1.06 . Stem .11 .20 .3 .07 .22 . Source: Statistics for Business and Economics page 29 and Basic Business Statistics Concepts and Applications page 55.08 .01 .2.03 .33 Solution To construct a stem and leaf display. .4 Leaf 2 4 7 2 2 8 1 0 3 8 2 2 1 8 6 1 3 9 2 Example 2 The table contains the price-earrings (P/E) ratios for samples of firms from the electronics industry and the auto parts industry.02 .

a.22 we make a vertical list of stems (the digits of each data item) like this : 4 5 6 7 8 9 10 Then we draw a vertical line to the right of these stems. What do your stem and leaf display suggest about the level of the P/E ratios of firms in the electronics industry as compared to firms in the auto parts industry? Explain. Page 132 Table: 3. 79 99 51 78 84 48 78 72 50 67 66 61 76 57 71 87 94 82 85 84 93 73 72 100 66 63 89 To Produce a stem and leaf display for the data in table 3. Levin and Rubin. b. Auto Parts Stem Leaf 1 1 2 3 3 5 5 3 9 4 5 2 2 2 4 7 5 4 3 6 1 2 5 2 8 6 8 5 6 Electronics Stem 1 2 3 4 5 Leaf 3 8 5 From the stem and leaf charts.22 Grades on Midterm quiz of 27 students given below. we can see the for auto parts. Construct a stem and leaf display for each of these data sets. while for electronics. the P/E ratio is usually <20. and list the leaves (the next digit for all the stems) to the right of the line in the order that we encountered them in the original data set . it is <30.

4 5 6 7 8 9 10 8 0 1 1 2 3 0 1 3 2 4 4 7 6 2 4 9 6 3 5 7 6 7 8 9 8 9 If we pick the 9/3 4 9. 94 and 99).4 5 6 7 8 9 10 8 7 7 9 7 9 0 1 6 8 5 4 0 6 8 4 3 3 6 4 1 3 2 2 4 2 1 Finally. it means there are three items in the data set that begin with nine (93. we arrange all of the leaves in each row in rank order. .