You are on page 1of 17

Introduction to Data research process

1. Business Understanding
While analyzing the data for the industry we should have clear overview and

understanding of the industry what it does, what kind of decision they are going to

make, for which purpose the data is being analyzed, this all data analyzing process is

started with a question, lots of people think that the data can be analyzed by using the

data set, availability of the data set is sufficient to analyze any kind of pattern, as per

understanding there is no data set for analyzing the data all we need it the questions

define the data sets itself, the only challenge, in this case, is while answering the one

questions another question can be pop up bu it is ok, it more than actually a part of

data analyzing process.

2. Acquire the Raw Data


This is the step where after defining the question, data is collected from the different

source such as data warehouse, logs, and data set to answer those question, row data

is queried to answering the questions but this is not the row data set, instead, we need

to call it row data because it is not exactly in the form of where we want it to analyzing.

3. Extract the Data


This is the step where data is extracted to create a final data set. that will allow us to

leads the further analyzing process this is a clean data set. SQL is used for extracting the

data from the database. the database which is queried to extract the data having several
rows exceed 1 Million. where database query languages like SQL enables an Analyst to

analyze and transform data easily. SQL is the first thing you should learn as it enables

you to work on the dataset.

4. Transform the Data


Data transformation is the process of converting the data or dataset from on state or

structure to another state structure, it is the fundamental state of data integration where

the data collected from different sources have been integrated into particular structured

data in such manner that it can be used at a desti006Eation for analysis process this

process is known as ETL(Extract Transform Load). The data transformation process refers

to detecting and understanding the data in its original structured or source format. This

is usually achieved with the help of algorithms which is implemented by using data

analysis and profiling tool. This step helps you decide what needs to happen to the data

to get it into the desired or requested format. Generally, R or Python language enables

you to perform data transformation on large or complex data that is coming from the

source.

5. Data Visualization
After building or creating the datasets, we need to visualize data to develop your

Hypothesis or Insights to explore and evaluate the data. Tableau/saas (data visualization

application) allows us to visualize large rows of columns of data in both structured and
unstructured databases and easily bring insights/ meaningful patterns out of the

dataset.

6. Statical Analysis
it is the important aspects of data analysis which summarize the data and it’s

understanding in terms of model and graphs apart from this it also explains how the

data is related to the underlying real world. the statical analysis is also used to

identifying the pattern or trends for predictive analytics which helps to make the

business decision, it also helps to determine the statical significance of the data set.

7. Data Model Development


industries are extremely interested to deploy model which has predictive capabilities,

data model development consists of the definition of model goals, the concept of the

problem and its translation into a computational model.

R/Python enables you to create a statistical model to reject any invalid or null

hypothesis, the modern application plays an important role in handling the

mathematical complexity. Vendors are developing software as services such as table and

SAS to making the analysis process easier and easier by building models using

automated predictive modeling tools designed for business analysts. analytics

professionals are utilizing machine learning algorithms from open-source marketplaces

or model building APIs to build a predictive application model.


8. Recommendations/Report/Story
This is the final step of the data analytics process where analysis decision is summarized

and the result or consequences of the analysis process is represented in terms of story,

report, recommendations and PPT, tableau and SAS application plays an important role

to summarize the analysis process via a report or story building, this report includes:

 Customer/Industries centric outcomes.

 Strategy and decision tree for the industries.

 Identification of business priority.

 Identification of target audience or consumers for the products.

 business case based on measurable outcomes.

Conclusion
For most businesses, enterprises, industries and government agencies, lack of data isn’t

a problem. There’s huge information available to make a clear data-driven and business-

oriented decision. With so much data to use in the analytics oriented process, we need

something more appropriate knowledge and information from available data: Business

needs to know it is the right data for making the data-driven decision. Business needs to

draw accurate conclusions from that data/information/knowledge. Business needs data

that informative and useful for decision-making process


Data visualization techniques and
information graphics apart from the general
line charts,
pie diagrams

https://online.hbs.edu/blog/post/data-visualization-techniques
What is Data?
Data is a raw and unorganized fact that required to be processed to
make it meaningful. Data can be simple at the same time unorganized
unless it is organized. Generally, data comprises facts, observations,
perceptions numbers, characters, symbols, image, etc.

Data is always interpreted, by a human or machine, to derive


meaning. So, data is meaningless. Data contains numbers,
statements, and characters in a raw form.

What is Information?
Information is a set of data which is processed in a meaningful way
according to the given requirement. Information is processed,
structured, or presented in a given context to make it meaningful and
useful.

It is processed data which includes data that possess context,


relevance, and purpose. It also involves manipulation of raw data.

Information assigns meaning and improves the reliability of the data. It


helps to ensure undesirability and reduces uncertainty. So, when the
data is transformed into information, it never has any useless details.
KEY DIFFERENCE
 Data is a raw and unorganized fact that is required to be
processed to make it meaningful whereas Information is a set of
data that is processed in a meaningful way according to the
given requirement.
 Data does not have any specific purpose whereas Information
carries a meaning that has been assigned by interpreting data.
 Data alone has no significance while Information is significant by
itself.
 Data never depends on Information while Information is
dependent on Data.
 Data measured in bits and bytes, on the other hand, Information
is measured in meaningful units like time, quantity, etc.
 Data can be structured, tabular data, graph, data tree whereas
Information is language, ideas, and thoughts based on the given
data.
 Data Vs. Information
Parameters Data Information
Qualitative Or
QuantitativeVariables It is a group of data which
Description
which helps to develop carries news and meaning.
ideas or conclusions.
Data comes from a Latin Information word has old French
word, datum, which and middle English origins. It
means “To give has referred to the “act of
Etymology
something.” Over a time informing.”. It is mostly used for
“data” has become the education or other known
plural of datum. communication.
Data is in the form of
Format numbers, letters, or a Ideas and inferences
set of characters.
It can be structured,
Language, ideas, andthoughts
Represented in tabular data, graph, data
based on the given data.
tree, etc.
Parameters Data Information
Data does not have any It carries meaning that has been
Meaning
specific purpose. assigned by interpreting data.
Information that is
Interrelation Information that is processed.
collected
Data is a single unit and Information is the product and
Feature is raw. It alone doesn’t group of data which jointly carry
have any meaning. a logical meaning.
It never depends on
Dependence It depended on Data.
Information
Measured in bits and Measured in meaningful units
Measuring unit
bytes. like time, quantity, etc.
Support for
It can’t be used for It is widely used for decision
Decision
decision making making.
making
Unprocessed raw
Contains Processed in a meaningful way
factors
Knowledge It is low-level It is the second level of
level knowledge. knowledge.
Data is the property of
an organization and is Information is available for sale
Characteristic
not available for sale to to the public.
the public.
Data depends upon the
Dependency sources for collecting Information depends upon data.
data.
Sales report by region and
Ticket sales on a band venue. It gives information which
Example
on tour. venue is profitable for that
business.
Data alone has no Information is significant by
Significance
signifiance. itself.
Data is based on Information is considered more
Meaning records and reliable than data. It helps the
observations and, which researcher to conduct a proper
Parameters Data Information
are stored in computers analysis.
or remembered by a
person.
The data collected by Information is useful and
Usefulness the researcher, may or valuable as it is readily available
may not be useful. to the researcher for use.
Information is always specific to
the requirements and
Data is never designed
expectations because all the
Dependency to the specific need of
irrelevant facts and figures are
the user.
removed, during the
transformation process.

DIKW (Data Information


Knowledge Wisdom)
DIKW is the model used for discussion of data,
information, knowledge, wisdom and their
interrelationships. It represents structural or
functional relationships between data, information,
knowledge, and wisdom.
Example:
Encoding data into shapes and forms in colour

https://paldhous.github.io/ucb/2018/dataviz/week2.html
Data graphics to address different types of audiences

Designing data visualization and hosting brunch are remarkably


similar arts. Each requires style, sure, but they also call for close
consideration of those consuming the fare. Missing the mark on the
audience can make or break your message. Champagne can make or
break a guest.

Each visualization you create will have an implied reader; you might
not have named them out loud, but somewhere in the back of your
mind you are making something for someone. Maybe you’re preparing
a big presentation for the department director. Perhaps you’d like to
explain a nuanced issue to a collaborator. Your mother might have
called, reminding you that she doesn’t know how you feed yourself. No
matter the audience, the way you construct a visualization should take
the specific reader into account.

The core visualization you create might stay roughly the same, at least
insofar as the data you use and the message you relay. Still, there are
meaningful adjustments that ensure your work is understood by the
people you want to reach. Here are a few of the personalities I’ve
encountered in my work, along with my suggestions for giving each
what they uniquely need.

A note on accessibility: Many of the suggestions you see


below speak to accessibility in the most literal sense — giving
your reader what they need in a way that works for them.
Nothing here preclude the best practices for low- or no-vision
readers. For more on that topic, take a look at the article linked
at the end.

The Big Boss


One of the more intimidating audiences, I like to think of individuals
who are sufficiently higher-ranking as smart but impatient. More
charitably, you might also call them smart and unenviably busy. Time
is a foregone luxury for these poor souls, and it’s kind to make their
lives as easy as possible. Repeat after me: Spoilers first!

Diagram by the author

The Big Boss will know the context already — that’s their job — though
a quick reminder will be appreciated. What they don’t know is why you
are standing at their desk. Explain yourself directly, in simple
language, via the title of your graph. Be obvious and say something
like, “No Significant Effect Observed from Study Drug”.

Use large, horizontal font that they can see without glasses or tilting
their head. Employ arrows and annotations to highlight your talking
points. Rename axis and legend labels to something other than
variable names. Write a caption — or even a few drill-down graphs —
that answer what you know they’ll ask. Your goal is to provide an
efficient, omniscient experience.

Collaborators on your team


The purpose of visualization in these settings is more likely for
diagnostics, change monitoring, or data exploration. You can probably
assume that, since they are direct collaborators, they already have the
context and understand your constraints.

You can relax a bit on the formatting for these readers — within reason,
of course — especially if you are iterating quickly and everyone is on
the same page. Unlike for other audiences, you might want to keep the
variable names on your axes. Installing reference points like threshold
lines, confidence intervals, and annotations will allow your peers to
sanity check the work and notice specific areas of interest.
Because you’re using visualization to pick out patterns and problems,
try to stick with similar aesthetics across team members; this will make
it easier to detect substantive changes and reduce interpretation errors.

Colleagues in your industry

Diagram by the author

I call the folks who work in your industry educated but


uninformed. They’ll understand the high-level topic — like that you
study gallbladders or underwear subscriptions — but not the minutia
of exactly whose gallbladder or which subscription. With these
audiences you’ll face a balancing act between giving enough
information without leaving anyone behind.
Like for the Big Boss, err on the side of extreme clarity. Use
annotations, plain language labels, and captions liberally. Try to
include enough information in the figure and its accompanying text
that it could stand alone — just in case someone takes a snapshot. If
you’re using a chart type that is unconventional or is particular to your
specialty, make sure to describe how to read it.

The general public


In the broadest possible sense, the general public should have no
presumed knowledge of your area of study, nor should they have any
particular visual literacy skills. You’ll need to lean heavily on the text
that accompanies the visual, including a simple explanation of how to
interpret it. Empathy is the name of the game here.

Consider sticking to more common formats like bars, lines, points, and
(gasp) pies. Use color schemes that have strong associations. Try to use
fewer than four or five categories to keep complexity low, and highlight
colors if there are particularly important areas. The overarching goal is
to provide a straightforward, low-stress experience for individuals who
might already feel out of their depth.

You yourself, the author


With the curtains drawn in the privacy of your own home, you have my
permission to make whatever visualizations you want. That said, you
should consider exercising the self-respect to make a visualization that
You in six months will be able to interpret.
Titles and captions still matter, and annotations will remind you of the
thing you’d pledged to keep an eye on. If you have any expectation of
creating followup figures, consider a color scheme that will serve you
well down the road, too. Five minutes of care now could save you hours
later.

You’ve already gone through the effort of finding something


compelling and drawing a picture of it. It’s merely wise, therefore, to
spend a few moments reflecting on your target audience before hitting
‘send’. Every type of reader will have different needs, and a little
intention goes a long way in making sure they have a good experience
with your work.

On a slightly different note, the brand aesthetic you develop for your
visualizations can have a big impact on the way your work is perceived.
Take a look at this article that describes how you can implement a
unique visualization brand for yourself.

You might also like