You are on page 1of 28

Analytical Methodology

• In terms of methodology, analytics differs significantly from the traditional


statistical approach of experimental design. Analytics starts with data. Normally
we model the data in a way to explain a response.
• The objectives of this approach is to predict the response behavior or understand
how the input variables relate to a response. Normally in statistical experimental
designs, an experiment is developed and data is retrieved as a result.
• This allows to generate data in a way that can be used by a statistical model,
where certain assumptions hold such as independence, normality, and
randomization.
• Normally once the business problem is defined, a research stage is needed to
design the methodology to be used. However general guidelines are relevant to be
mentioned and apply to almost all problems.
Analytical Methodology
• One of the most important tasks in data analytics is statistical modeling, meaning
supervised and unsupervised classification or regression problems.
• Once the data is cleaned and preprocessed, available for modeling, care should be
taken in evaluating different models with reasonable loss metrics and then once
the model is implemented, further evaluation and results should be reported.
• A common pitfall in predictive modeling is to just implement the model and never
measure its performance.
Analytical Methodology
• Preparing objectives & identifying data requirements,

• Data Collection,

• Understanding data

• Data preparation – Data Cleansing, Normalization


Analytical Methodology
Preparing objectives & identifying data requirements,
• Despite the recent increase in computing power and access to data over the last
couple of decades, our ability to use the data within the decision making process
is either lost or not maximized at all
• Too often, we don't have a solid understanding of the questions being asked and
how to apply the data correctly to the problem at hand.
• It is very important to understand a methodology that can be used within data
science, to ensure that the data used in problem solving is relevant and properly
manipulated to address the question at hand
• Data requirements definition establishes the process used to identify, prioritize,
precisely formulate, and validate the data needed to achieve business objectives.
Analytical Methodology
Preparing objectives & identifying data requirements,
• When documenting data requirements, data should be referenced in business
language, reusing approved standard business terms if available.
• If business terms have not yet been standardized and approved for the data within
scope, the data requirements process provides the occasion to develop them.
Analytical Methodology
Data Collection

Data collection is an important aspect of research. Let’s consider an example of a


mobile manufacturer, company X, which is launching a new product variant.

To conduct research about features, price range, target market, competitor analysis
etc. data has to be collected from appropriate sources.

The marketing team can conduct various data collection activities such as online
surveys or focus groups.

The survey should have all the right questions about features and pricing such as
“What are the top 3 features expected from an upcoming product?”

or “How much are your likely to spend on this product?” or “Which competitors
provide similar products?” etc.
Analytical Methodology
Data Collection

Qualitative Vs Quantitative

Primary Vs Secondary?

Online Vs Offline

Interview Vs Questionnaire

Telephonic vs Personal
Analytical Methodology
Quantitative Data Collection
Analytical Methodology
Quantitative Data Collection
Understanding the Data
“Without context, data is useless, and any visualization you create with it will also
be useless. Using data without knowing anything about it, other than the values
themselves”
Understanding the Data
You should know the who, what, when, where, why, and how —about the data
before you can know what the numbers are actually about.

Who
“A quote in a major newspaper carries more weight than one from a
celebrity gossip site that has a reputation for stretching the truth.”

Similarly, data from a reputable source typically implies better accuracy than a
random online poll.

In addition to who collected the data, who the data is about is also important
Understanding the Data
How?
People often skip methodology because it tends to be complex and for a technical
audience, but it’s worth getting to know the gist of how the data of interest was
collected.
• Do you trust it right away, or do you investigate?

• Look out for small samples, high margins of error, and unfit assumptions about the
subjects

• Sometimes people generate indices to measure the quality of life in countries, and a
metric like literacy
Understanding the Data
What?
Ultimately, you want to know what your data is about, but before you
can do that, you should know what surrounds the numbers.
Talk to subject experts, study accompanying documentation.
When you get to real-world data, the goal shifts to information
gathering.
You shift from, “What is in the numbers?” to “What does the data
represent in the world; does it make sense; and how does this relate to
other data?”
Understanding the Data
When?
Most data is linked to time in some way in that it might be a time series,
or it’s a snapshot from a specific period.
In both cases, you have to know when the data was collected. An
estimate made decades ago does not equate to one in the present.

This seems obvious, but it’s a common mistake to take old data and pass
it off as new because it’s what’s available.

Things change, people change, and places change, and so naturally, data
changes.
Understanding the Data
Where?
Things can change across cities, states, and countries just as they do
over time.

For example, it’s best to avoid global generalizations when the data
comes from only a few countries.

The same logic applies to digital locations. Data from websites, such as
Twitter or Facebook, encapsulates the behavior of its users and doesn’t
necessarily translate to the physical world.
Understanding the Data
Why?
You must know the reason data was collected, mostly as a
sanity check for bias.
Sometimes data is collected, or even fabricated, to serve an
agenda, and you should be wary of these cases.
What is Data Preparation?
Data preparation is the process of cleaning and transforming raw data prior to
processing and analysis.
It is an important step prior to processing and often involves reformatting data,
making corrections to data and the combining of data sets to enrich data.
Data preparation is often a lengthy undertaking for data professionals or business
users, but it is essential as a prerequisite to put data in context in order to turn it into
insights and eliminate bias resulting from poor data quality.

For example, the data preparation process usually includes standardizing data
formats, enriching source data, and/or removing outliers.
Data Preparation
Benefits of Data Preparation
“Most of data scientists say that data preparation is the worst part of their job, but
the efficient, accurate business decisions can only be made with clean data.”
Data preparation helps:
Fix errors quickly — Data preparation helps catch errors before processing. After
data has been removed from its original source, these errors become more difficult
to understand and correct.
Produce top-quality data — Cleaning and reformatting datasets ensures that all data
used in analysis will be high quality.
Make better business decisions — Higher quality data that can be processed and
analyzed more quickly and efficiently leads to more timely, efficient and high-
quality business decisions.
Data Preparation
Data Preparation Steps:

1. Gather data

2. Discover and assess data

3. Cleanse and validate data

4. Transform and enrich data

5. Store data
Data Cleaning?
• Data cleaning is the process of fixing or removing incorrect,
corrupted, incorrectly formatted, duplicate, or incomplete data within a
dataset.
• When combining multiple data sources, there are many opportunities
for data to be duplicated or mislabeled.
• If data is incorrect, outcomes and algorithms are unreliable, even
though they may look correct.
• There is no one absolute way to prescribe the exact steps in the data
cleaning process because the processes will vary from dataset to
dataset.
• But it is crucial to establish a template for your data cleaning process
so you know you are doing it the right way every time.
Step 1: Remove duplicate or irrelevant observations
Remove unwanted observations from your dataset, including duplicate observations
or irrelevant observations

Step 2: Fix structural errors


Structural errors are when you measure or transfer data and notice strange naming
conventions, typos, or incorrect capitalization.
Step 3: Filter unwanted outliers
Often, there will be one-off observations where, at a glance, they do not appear to fit
within the data you are analysing.
If you have a legitimate reason to remove an outlier, like improper data-entry, doing
so will help the performance of the data you are working with.
Step 4: Handle missing data
You can’t ignore missing data because many algorithms will not accept missing
values. There are a several ways to deal with missing data.
Data Blending
• Data blending is the process of combining data from multiple sources into a
functioning dataset.
• This process is gaining attention among analysts and analytic companies because
it is a quick and straightforward method used to extract value from multiple data
sources.
• It can help to discover correlations between the different data sets without the time
and expense of traditional data warehouse processes.
The goals of data blending
• Reveal a “deeper intelligence” within your data, by utilizing data from multiple
sources
• Provide accurate, actionable data in the hands of business analysts in a timely
matter
• Drive better decision-making skills by senior leaders in an organization
Data modeling
• Data modelling is the process of creating a data model for the data to be stored in
a database.
• This data model is a conceptual representation of Data objects, the associations
between different data objects, and the rules.
• Data modeling helps in the visual representation of data and enforces business
rules, regulatory compliances, and government policies on the data.
• Data Models ensure consistency in naming conventions, default values, semantics,
security while ensuring quality of the data.
• The Data Model is an abstract model that organizes data description, data
semantics, and consistency constraints of data.
• The data model emphasizes on what data is needed and how it should be
organized instead of what operations will be performed on data.
• Data Model is like an architect's building plan, which helps to build conceptual
models and set a relationship between data items.
Data Visualization
“Data visualization is the visual representation of your data. With the help of charts,
maps, and other graphical elements these tools provide a simple and comprehensible
way to clearly see and easily discover insights and patterns in your data.”
Why do we need data visualization?
With the help of descriptive graphics and dashboards, even difficult information can
be clear and comprehensible.
Here are some noteworthy numbers, based on research, that confirm the importance
of visualization:
• People get 90% of information about their environment from the eyes.
• 50% of brain neurons take part in visual data processing.
• Pictures increase the wish to read a text up to 80%.
• People remember 10% of what they hear, 20% of what they read, and 80% of
what they see.
Advantages
Relevant visualization brings lots of advantages for your business:

• Fast decision-making. Summing up data is easy and fast with graphics, which let
you quickly see that a column or touchpoint is higher than others without looking
through several pages of statistics in Google Sheets or Excel.
• More people involved. Most people are better at perceiving and remembering
information presented visually.
• Higher degree of involvement. Beautiful and bright graphics with clear messages
attract readers’ attention.
• Better understanding. Perfect reports are transparent not only for technical
specialists, analysts, and data scientists but also for CMOs and CEOs, and help
each and every worker make decisions in their area of responsibility.
Common general types of data visualization:
Charts
Tables
Graphs
Maps
Infographics
Dashboards
More specific examples of methods to visualize data:
Area Chart
Bar Chart
Box-and-whisker Plots
Scatter plot

You might also like