You are on page 1of 4

Reading 1: Data, Data Types and Data Scales

Data: The Foundation of Statistical Analysis

Data are the building blocks of statistical analysis, providing the raw materials from which
insights and conclusions are drawn. WE begin our Data Analysis course with this article, as
we delve into the fundamental concepts of data, scales of measurement, data types, and
sources of data. By understanding these concepts, you'll be better equipped to navigate the
world of statistics and make informed decisions based on data-driven insights.

Data: Basics
Data are essentially the facts and figures collected, analysed, and summarized for the
purpose of presentation and interpretation. When we refer to all the data collected in a
particular study, we call it the "data set" for that study. These data sets serve as the
foundation upon which statistical analysis is built.

Example of a data set


Communication Quantitative
Candidate Rank
Gender skill (out of ability (out
name awarded
100) of 100)
Ann F 56 65 5
Cath F 66 58 4
Dev M 67 62 2
Ben M 62 54 6
Katy F 71 66 1
Don M 58 67 3

Data: Components
To work with data effectively, we need to grasp the concept of Elements, Variables, and
Observations:
Elements are the entities on which data are collected. These could be individuals, objects, or
entities of interest in a study. In the example each candidate is an Entity/ Element.
Variable is a characteristic or attribute of interest for these elements. It's what we're
measuring or observing in the study. In the above example – Gender, Scores of
Communication skills and Rank.
Observations are the actual measurements collected for each variable for every element in
the study. An observation essentially represents the data point for a specific element and
variable. The set of measurement for first candidate is first observation and so on so for 6
candidates we have 6 observations.

Notes prepared for Data Analysis, PGDM –online SPJIMR


by Dr. Debmallya Chatterjee and Ms. Binita Salian.
Reading 1: Data, Data Types and Data Scales

Data: Types
Data can be categorized into two primary types: Categorical and Quantitative.
Categorical Data: Data that can be grouped into specific categories are Categorical. These
categories are typically represented using Nominal or Ordinal scales. (explained in Scales of
measurement below)
Quantitative Data: Data that use numeric values to indicate how much or how many are
considered Quantitative. Quantitative data are usually measured using either Interval or
Ratio scales. (explained in Scales of measurement below)
Quantitative data can be either Discrete or Continuous. Discrete data involve distinct,
separate values and measure "how many." Continuous data involve range that are values
with no clear separation and measure "how much."
The type of data dictates the appropriate statistical analysis. Categorical data are typically
summarized by counting observations in each category or computing proportions. In
contrast, arithmetic operations such as addition, subtraction, multiplication, and division
provide meaningful results for quantitative data, allowing for more extensive statistical
analysis.

Cross-Sectional and Time Series Data


Distinguishing between cross-sectional and time series data is crucial for statistical analysis.
Cross-Sectional Data: These are data collected at the same or approximately the same point
in time. They provide a snapshot view of a specific point in time. They are often used for
comparing different entities at a specific moment.
Time Series Data: Time series data are collected over multiple time periods, allowing for the
analysis of trends, patterns, and changes over time. This type of data is valuable for
forecasting and understanding historical developments.
Visualization techniques like graphs- line and bar charts of time series data are commonly
used to identify trends and make future projections

Data: Scales of Measurement


Data comes in different scales of measurement, each offering varying levels of information
and implications for data analysis. The four primary scales of measurement are Nominal,
Ordinal, Interval, and Ratio.

Nominal Scale: When data for a variable consist of labels or names used to identify an
attribute of an element, it's considered nominal. Numeric codes might also be used for
convenience. For instance, assigning 1 for Domestic Equity, 2 for International Equity, and 3
for Fixed Income, Such codes 1,2, and 3 are considered to be Nominal data, even when they
are numeric values in this situation.

Notes prepared for Data Analysis, PGDM –online SPJIMR


by Dr. Debmallya Chatterjee and Ms. Binita Salian.
Reading 1: Data, Data Types and Data Scales

Ordinal Scale: If data exhibit the properties of nominal data and the order or rank of the
data is meaningful, it's considered an ordinal scale. For example, customer ratings like
"excellent," "good," or "poor" can be ranked, making the scale ordinal. Here the order in
relation to others is known but not the magnitude (i.e by how much)

Interval Scale: Data are on an interval scale if they have all the properties of ordinal data
and the interval between values is expressed in terms of a fixed unit of measure. Interval
data are always numeric. SAT scores, for instance, are on an Interval scale because the
differences between measured scores are meaningful.

Ratio Scale: Data are on a ratio scale if they have all the properties of interval data and the ratio
of two values is meaningful. Variables like distance, height, weight, and time are typically measured
on a ratio scale. A crucial aspect of the ratio scale is the presence of a meaningful zero point,
indicating that nothing exists for the variable at that point.

Data Sources
Data can be obtained from various sources, and the choice of source depends on the
availability, relevance, and purpose of the data.
Existing Sources: In some cases, the data needed for a particular study/application, already
exists/ is available. Companies maintain internal databases about their operations,
customers, and employees. External data sources, such as Dun & Bradstreet, Bloomberg,
and Dow Jones & Company, provide extensive business data. Industry associations and
special interest organizations also offer relevant data.
The Internet: The internet has become a valuable source of data and statistical information.
Many companies maintain websites with data on various aspects of their operations, and
specialized data providers offer a wide range of information, from stock quotes to meal
prices.
Government Agencies: Government agencies are important sources of data, particularly for
economic and labour-related statistics. Agencies like the U.S. Department of Labour provide
data on employment rates, wage rates, labour force size, and more.

Statistical Studies
When data aren't readily available from existing sources, statistical studies can be
conducted to gather the necessary information. These studies can be Experimental or
Observational.
Experimental Studies: In experimental studies, researchers identify a variable of interest
and control one or more variables to study their influence on the variable of interest. For

Notes prepared for Data Analysis, PGDM –online SPJIMR


by Dr. Debmallya Chatterjee and Ms. Binita Salian.
Reading 1: Data, Data Types and Data Scales

example, a pharmaceutical company might conduct an experiment to understand how a


new drug affects nervous system by controlling dosage levels.
Observational Studies: Observational studies, on the other hand, do not involve control
over variables. Surveys are a common form of observational study, where questionnaires
are administered to a sample to gather data. Observational studies are often used in fields
like market research to understand customer opinions.
Consideration of Time and Cost: Decision-makers must be aware of the time and cost
required to obtain data. Using existing data sources is preferable when time is limited. If
necessary data are not readily available, the additional time and cost involved in data
collection must be considered. The value of statistical analysis should outweigh the
expenses incurred during data acquisition.

Data Acquisition Errors


Data acquisition errors and missing values in a data set can have significant consequences
for decision-making, by impacting the reliability of statistical analysis. It's essential to be
vigilant about potential errors during data acquisition. These errors occur when the data
value obtained doesn't match the true or actual value. Errors can stem from various
sources, including recording mistakes, respondent misinterpretations, or data outliers.
Experienced data analysts employ techniques to minimize errors, such as reviewing data for
internal consistency and identifying outliers. Blindly using erroneous data can lead to
misleading conclusions and poor decision-making, emphasizing the importance of acquiring
accurate data for reliable analysis.

In conclusion, data is the foundation of statistical analysis, and understanding its types,
scales of measurement, and sources is crucial for informed decision-making. Accurate data
collection and analysis are essential for drawing meaningful insights and making sound
decisions in various fields and industries.

_________________________________________________________________________________
References:
Statistics for Business and Economics –by Davis Anderson, Dennis J Sweeney, Thomas A Williams.
Statistics for Business Decision Making and Analysis – by Robert A. Stine and Dean Foster

Notes prepared for Data Analysis, PGDM –online SPJIMR


by Dr. Debmallya Chatterjee and Ms. Binita Salian.

You might also like