You are on page 1of 2

Data Understanding Table

The Data Understanding table provides an organized approach to comprehend the data
provided. Without a good understanding of the columns of the data, it is not possible to
perform a good analysis of the data. Understanding of the data acts as the “foundation” for
the analysis. This includes having a definition for the column, which isn’t always as obvious as
you might think as well as characteristics about the data. Consider the table a “dictionary” and
“road map” for the rest of the analysis

In many cases, the data analyst is responsible for creating the Data Understanding table since
such a table is usually not available. This is typically a difficult task. As more information
about the columns are learned, the Data understanding table should be updated.

The Data Understanding table includes the following columns.

Column Data Data Type Definition Comments


Name Purpose

 Column Name – Name of the column in from your data set.


 Data Purpose – Whether the column is a “dimension” or a “target”.
 Targets are used to define success for the organization (e.g., profit; GPA, quantity
sold).
 Targets tend to be numeric but may not always be a number (e.g., win or lose,
pass or fail).
 On the flip side, even though the column is numeric, it may not be a Target
(e.g., phone number, age, temperature).
 Consider the column to see whether the value can define success. For
example, sales price is what you want to charge for a product, but it doesn’t
define success.
 Dimensions allow the Targets to be broken down into smaller pieces or examined
from various perspectives. Dimensions can include columns such as dates,
regions, products, etc.
 For example, assume the Target is GPA. The GPA can be examined from
different dimensions such as GPA by Semester, GPA by major; GPA by type of
assignment (e.g., exams, homework, projects); GPA by athlete/non-athlete,
etc.
 For example, assume the Target is Profit. The profit can be “sliced and diced”
by day or week, product, product category, gender of customer, time of day,
location of store, etc.
 Dimensions tend to be non-numeric but numeric values (age, baseball inning,
temperature) can also be dimensions.
 Consider the column to see whether the value in the column could be used to
split up any Targets that you have.
 Data Type – Describes the type of information in the column. Typical entries will be:
 String – Includes both letters and numbers. Also called “Text” or “Alphanumeric”
in some cases.
 Date/Time – Includes data that is either a date or time or both.
 Numeric – Data that is composed of numbers.
 Not all fields with numbers are numerical. Consider whether meaningful math
can be performed on a column to determine whether the data is numeric
(meaningful) or string (non-meaningful). For example, Social Security Numbers
or telephone numbers consist of numbers but would be considered “string”
since any math performed on the columns would not make sense.
 Geographic – Data that contains some sort of location that can be typically
plotted on a map. This can include information such as city, state, longitude, or
latitude, etc.
 Certain data such as airports, train stations, etc. can be considered a
geographic variable if the information can be plotted on a map.
 Definition – The meaning of the column in non-technical terms. The more
information provided the better. If a calculation, include the formula and
explanation.
 Comments – Any additional comments that may help explain the values. This might
include allowed values (e.g., full-time student is between 12 and 18.5 credits).

You might also like