# Qualitative data

Qualitative data is a categorical measurement expressed not in terms of numbers, but rather by means of a natural language description. In statistics, it is often used interchangeably with "categorical" data. For example: favorite color = "yellow" height = "tall" When the categories may be ordered, these are called ordinal variables. Categorical variables that judge size (small, medium, large, etc.) are ordinal variables. Attitudes (strongly disagree, disagree, neutral, agree, strongly agree) are also ordinal variables, however we may not know which value is the best or worst of these issues. Note that the distance between these categories is not something we can measure.
Object 1

Quantitative data
Quantitative data is a numerical measurement expressed not by means of a natural language description, but rather in terms of numbers. However, not all numbers are continuous and measurable. For example, the social security number is a number, but not something that one can add or subtract. For example: favorite color = "450 nm" height = "1.8 m" Quantitative data always are associated with a scale measure. Probably the most common scale type is the ratio-scale. Observations of this type are on a scale that has a meaningful zero value but also have an equidistant measure (i.e., the difference between 10 and 20 is the same as the difference between 100 and 110). For example, a 10 yearold girl is twice as old as a 5 year-old girl. Since you can measure zero years, time is a ratio-scale variable. Money is another common ratio-scale quantitative measure. Observations that you count are usually ratio-scale (e.g., number of widgets). A more general quantitative measure is the interval scale. Interval scales also have a equidistant measure. However, the doubling principle breaks down in this scale. A temperature of 50 degrees Celsius is not "half as hot" as a temperature of 100, but a difference of 10 degrees indicates the same difference in temperature anywhere along the scale. The Kelvin temperature scale, however, constitutes a ratio scale because on the Kelvin scale zero indicates absolute zero in temperature, the complete absence of heat. So one can say, for example, that 200 degrees Kelvin is twice as hot as 100 degrees Kelvin.

The differences between qualitative and quantitative data:
Quantitative data is data that is relating to, measuring, or measured by the quantity of something, rather than its quality. ex: the number of people in a town Qualitative data is data that can be captured that is not numerical in nature ex: the color of people's skin. Thus, essentially the distinction is that quantitative data deals with numbers and numerical values of what is being tested, where as qualitative data deals with the quality of what is being tested.

or without regard to differences in time. where we have an observation at every instant of time. There are two kinds of time series data: 1. which follows one subject's changes over the course of time. we can only describe the current proportion. electrocardiograms. both the presence of an individual in the sample and the time at which the individual is included in the sample are determined randomly. Time is called the independent variable (in this case however. Cross-sectional data differs from time series data also known as longitudinal data. 2. wind speed. in statistics and econometrics is a type of one-dimensional data set.[1] Time Series A time series is a sequence of observations which are ordered in time (or space).daily rainfall. Cross-sectional data. Continuous. e. panel data (or time-series crosssectional (TSCS) data). we want to measure current obesity levels in a population. particularly since successive observations will probably be dependent. For example. Note that we do not know based on one cross-sectional sample if obesity is increasing or decreasing. For example. Analysis of cross-sectional data usually consists of comparing the differences among the subjects. measured typically at successive time instants spaced at uniform time intervals. a political poll may decide to interview 100. Time series are very frequently plotted via line charts. We denote this using observation X at time t. In a rolling cross-section.g. signal processing. measure their weight and height. at that one point in time. 30% of our sample were categorized as obese. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the Nile River at Aswan. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Examples Economics . Time series are best displayed in a scatter plot. where we have an observation at (usually regularly) spaced intervals. or a cross section of a study population. firms or countries/regions) at the same point of time. Cross-sectional data refers to data collected by observing many subjects (such as individuals.000 individuals. Discrete. Earthquake prediction. X(t). Quantitative data's description can only be described in numbers. and thus included in the survey. Control engineering and Communications engineering a time series is a sequence of data points. etc). Panel analysis uses panel data to examine changes in variables over time and differences in variables between subjects. It first selects these individuals randomly from the entire population. The series value X is plotted on the vertical axis and time t on the horizontal axis. This cross-sectional sample provides us with a snapshot of that population.Qualitative data's description cannot be describe in numbers. temperature Sociology . it is most sensible to display the data in the order in which they arose. Another variant. monthly profits Meteorology . something over which you have little control). econometrics. employment figures In statistics. We could draw a sample of 1. For example.crime figures (number of arrests. This is the random date on which that individual will be interviewed.weekly share prices. lie detectors. Weather forecasting. Time series forecasting is the use of a model to predict future values based on previously observed values. mathematical finance. .000 people randomly from that population (also known as a cross section of that population). If observations are made on some phenomenon throughout time. combines both and looks at multiple subjects and how they change over the course of time. It then assigns a random date to each individual. pattern recognition. We denote this as Xt.Electroencephalography. and calculate what percentage of that sample is categorized as obese.

Additionally time series analysis techniques may be divided into parametric and nonparametric methods. Quantitative data. explaining people's wages by reference to their respective education levels. Discrete data. Primary data.Time series data have a natural temporal ordering. Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e. non-parametric approaches explicitly estimate the covariance or the spectrum of the process without assuming that the process has any particular structure.) B. autobiography. biography.E.Qualitative Data. gender etc. The former include spectral analysis and recently wavelet analysis. However. using an autoregressive or moving average model). Raw data would be the basic numbers and details collected from research without any manipulations.g. According to Nature 1.g. rather than from future values (see time reversibility. with justification. the latter include auto-correlation and cross-correlation analysis. Secondary data. sequences of characters. financial statement) 2.g. Classifications of data A. age. It is the "input" for any statistical calculations. accounting for house prices by the location as well as the intrinsic characteristics of the houses). characteristics names or labels or alphanumeric variables (e. According to Source 1. or subjects might be excluded if they do not meet certain predefined criteria. Time series analysis can be applied to: • real-valued.information obtained from variables in the form of categories. etc) 2.hand information (e. In addition. -Whole numbers only . I. in which there is no natural ordering of the observations (e. weather forecast from news papers) C. certain anomalies can be removed from a data set before performing calculations. such as letters and words in English language[1] ). where the individuals' data could be entered in any order). According to Measurement 1.g. By contrast. univariate and multivariate. Astochastic model for a time series will generally reflect the fact that observations close together in time will be more closely related than observations further apart.g. The parametric approaches assume that the underlying stationary Stochastic processhas a certain structure which can be described using a small number of parameters (for example.has an equal whole number interval .g.second-hand information (e.information obtained from numeral variables(e. bills. In these approaches.countable numerical observation. birthdays.first. Additionally methods of time series analysis may be divided into linear and nonlinear.) Methods for time series analyses may be divided into two classes: frequency-domain methods and time-domain methods. This makes time series analysis distinct from other common data analysis problems.e. the task is to estimate the parameters of the model that describes the stochastic process. continuous data • discrete numeric data • discrete symbolic data (i. time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values.

the word average being variously construed as mean. meaningful understanding of the population. The following may be applied to individual dimensions of multidimensional data.) 2. Continuous data-measurable observations. the principles of random sampling and statistical assumptions still apply. data may be in the form of tests scores. statistics and other information can help point to the most important factors. where each participant must be in one or other category. However. in a quality control situation. the notion of a "central location" is retained in converting an "average" computed for the transformed data back to the original units. I believe the question can be rephrased to how a statistician may approach obtaining valid data for the purposes of interpretation. median. Is it a problem of the road condition. content analysis is frequently used to count and/or categorize randomly sampled media content (for example. So.no specific arrangement 2. after transformation. In the simplest cases. I will give you some suggestions and perhaps you can rephrase your question to a specific problem. there are several different kinds of calculations for central tendency.. In education. In addition. Central tendency In statistics. In media research. -decimals or fractions -obtained through measuring(e. where a sample of parts from a larger batch of parts are selected and tested. It should be noted that surveys are not the only way of collecting data.organized set of data . Grouped Data . or other measure of location. GPA. etc. the key to collecting data is that it is representative of the larger population that you are interested in. The means to reduce biases in these surveys is very important. corporate stocks. bank deposits.obtained through counting(e. Your question is very general. dead or alive. data is collected with the purpose of making inferences to a larger population which can not be surveyed. still with the intent that the data can provide a significant. and cannot be in both. for example. For instance. volume of liquid etc. poor signs.g.raw data . The statistician has choices to make in a planned observational or experimental study.[1] A measure of central tendency is any of a number of ways of specifying this "central value". comparing the volume or tone of war coverage in newspapers to television). Both "central tendency" and "measure of central tendency" apply to either statistical populations or to samples from a population. and may not tell the full story. pregnant or not pregnant). where the kind of calculation depends on the type of data (level of measurement). More complex sampling schemes are possible. The simple random selection may be appropriate in many cases. The list of alternatives to survey research is extensive. . Generally.g. too many exits. the term central tendency relates to the way in which quantitative data tend to cluster around some value. Ungrouped data. the measure of central tendency is an average of a set of measurements. etc. let's say that one road has a high number of accidents. the terms are often used before one has chosen even a preliminary form of analysis: thus an initial objective might be to "choose an appropriate measure of central tendency". In this example.) D. the term is applied to multidimensional data as well as to univariate data and in situations where a transformation of the data values for some or all dimensions would usually be considered necessary: in the latter cases. In practical statistical analysis. the drivers that use that road. in statistics. but in all cases. According to Arrangement 1. Data can be complicated. depending on the context. etc.g.at least 2 groups involved -arranged Dichotomous data are data from outcomes that can be divided into two categories (e.

some measures of central tendency become more appropriate to use than others. As such. usually denoted by (pronounced x bar). Introduction A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data.. Midhinge – the arithmetic mean of the two quartiles. although its use is most often with continuous data (see our Types of Variable guide for data types). is: This formula is usually written in a slightly different manner using the Greek capitol letter. under different conditions. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.. the median and the mode. They are also classed as summary statistics. Mean (Arithmetic) The mean (or average) is the most popular and well known measure of central tendency. . if we have n values in a data set and they have values x1.. xn. Trimean – the weighted arithmetic mean of the median and two quartiles. The mean. but there are others. such as. which means "sum of. then the sample mean. It can be used with both discrete and continuous data. Truncated mean – the arithmetic mean of data values after a certain number or proportion of the highest and lowest data values have been discarded. • • • • • • • • • • • • Arithmetic mean – the sum of all measurements divided by the number of observations in the data set Median – the middle value that separates the higher half from the lower half of the data set Mode – the most frequent value in the data set Geometric mean – the nth root of the product of the data values Harmonic mean – the reciprocal of the arithmetic mean of the reciprocals of the data values Weighted mean – an arithmetic mean that incorporates weighting to certain data elements Distance-weighted estimator – the measure uses weighting coefficients for xi that are computed as the inverse mean distance between xi and the other data points. In the following sections we will look at the mean.although some of these involve their own implicit transformation of the data. Midrange – the arithmetic mean of the maximum and minimum values of a data set. measures of central tendency are sometimes called measures of central location. mode and median and learn how to calculate them and under what conditions they are most appropriate to be used. x2.. pronounced "sigma". median and mode are all valid measures of central tendency but. So.": . Winsorized mean – an arithmetic mean in which extreme values are replaced by values closer to the median.. The mean (often called the average) is most likely the measure of central tendency that you are most familiar with. .

samples and populations have very different meanings and these differences are very important. Median The median is the middle score for a set of data that has been arranged in order of magnitude. You will notice. An important property of the mean is that it includes every value in your data set as part of the calculation. However. When not to use the mean The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. it is the value that produces the lowest amount of error from all other values in the data set. The mean is being skewed by the two large salaries. Another time when we usually prefer the median over the mean (or mode) is when our data is skewed (i. we use the Greek lower case letter "mu". However. However. they are calculated in the same way. That is. as most workers have salaries in the \$12k to 18k range. however. even if. In order to calculate the median. However. in statistics. median and mode are identical.You may have noticed that the above formula refers to the sample mean.when the data is perfectly normal then the mean. the frequency distribution for our data is skewed). in the case of the mean. As we will find out later. the median best retains this position and is not as strongly influenced by the skewed values. To acknowledge that we are calculating the population mean and not the sample mean. For example. as the data becomes skewed the mean loses its ability to provide the best central location for the data as the skewed data is dragging it away from the typical value. If we consider the normal distribution . The median is less affected by outliers and skewed data. denoted as µ: The mean is essentially a model of your data set. taking the median would be a better measure of central tendency in this situation.as this is the most frequently assessed in statistics . This is explained in more detail in the skewed distribution section later in this guide. consider the wages of staff at a factory below: Staff Salary 1 15k 2 18k 3 16k 4 14k 5 15k 6 15k 7 12k 8 17k 9 90k 10 95k The mean salary for these ten staff is \$30. These are values that are unusual compared to the rest of the data set by being especially small or large in numerical value. in this situation we would like to have a better measure of central tendency. that the mean is not often one of the actual values that you have observed in your data set.e. the mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero. So. Therefore. In addition.7k. why call have we called it a sample mean? This is because. one of its important properties is that it minimises error in the prediction of any one value in your data set. Moreover. they all represent the most typical value in the data set. inspecting the raw data suggests that this mean value might not be the best way to accurately reflect the typical salary of a worker. suppose we have the data below: 65 55 89 56 35 14 56 55 87 45 92 . It is the value that is most common.

population growth rate. Chemistry. unemployment. Trade. You can. Astronomy etc…. median and mode Please use the following summary table to know what the best measure of central tendency is with respect to the different types of variable. Type of Variable Nominal Ordinal Interval/Ratio (not skewed) Interval/Ratio (skewed) Best measure of central tendency Mode Median Mean Median Measures of central tendency and dispersion provide a convenient way to describe and compare sets of data. so application of statistics is very wide. Mathematics. you simply have to take the middle two scores and average the result.5. Economics. therefore. This works fine when you have an odd number of scores but what happens when you have an even number of scores? What if you had only 10 scores? Well. Biology. Commerce. sometimes consider the mode as being the most popular option. Botany. Now statistics holds a central position in almost every field like Industry. Psychology. .in this case 56 (highlighted in bold). the mode is used for categorical data where we wish to know which is the most common category as illustrated below: Summary of when to use the mean. So. Physics. Now we discuss some important fields in which statistics is commonly applied. An example of a mode is presented below: Normally. Statistics has important role in determining the existing position of per capita income. Importance of Statistics in Different Fields Statistics plays a vital role in every fields of human activity. schooling medical facilities etc…in a country.We first need to rearrange that data into order of magnitude (smallest first): 14 35 45 55 55 56 56 65 87 89 92 Our median mark is the middle mark . It is the middle mark because there are 5 scores before it and 5 scores after it. Mode The mode is the most frequent score in our data set. housing. if we look at the example below: 65 55 89 56 35 14 56 55 87 45 We again rearrange that data into order of magnitude (smallest first): 14 35 45 55 55 56 56 65 87 89 92 Only now we have to take the 5th and 6th score in our data set and average them to get a median of 55. On a histogram it represents the highest bar in a bar chart or histogram.