Professional Documents
Culture Documents
Data analytics
Introduction
Data: Data is a set of values of qualitative or quantitative
variables. It is information in raw or unorganized form. It
may be a fact, f igure, characters, symbols etc. Data can be
numbers, like the record of daily weather, or daily sales.
Data can be alphanumeric, such as the names of
employees and customers.
Information- Meaningful or organized data is information,
comes from analyzing data.
Data base: A database is a modeled collection of data that is
accessible in many ways. A data model can be designed to
integrate the operational data of the organization. The data
model abstracts the key entities involved in an action and their
relationships. Most databases today follow the relational data
model and its variants.
Take the example of a sales organization. A data model for
managing customer orders will involve data about customers,
orders, products, and their interrelationships. The relationship
between the customers and orders would be such that one
customer can place many orders, but one order will be placed
by one and only one customer. It is called a one-to-many
relationship. The relationship between orders and products is
a little more complex. One order may contain many products.
And one product may be contained in many different orders.
Data Warehouse:
A data warehouse is an organized store of data from all
over the organization, specially designed to help make
management decisions.
Data can be extracted from operational database to
answer a particular set of queries. This data, combined
with other data, can be rolled up to a consistent
granularity and uploaded to a separate data store called
the data warehouse. Therefore, the data warehouse is a
simpler version of the operational data base, with the
purpose of addressing reporting and decision-making
needs only.
Data Mining :
Data Mining is the art and science of discovering useful
innovative patterns from data. There is a wide variety of
patterns that can be found in the data.
Evolution of Data Analytics
Why Data Analytics?
Organizations today handle and store billions of rows of
data, possibly with millions of combinations. Data
Analytics has been hailed as the ‘Game Changer’,
because businesses could transform the raw data into
something actionable, which improved their profits. One
of the first applications of analytics were found in the
field of marketing, sales and customer relationship
management.
Once the firms had analyzed the data, they found
plethora of information ranging from insights into the
customer’s needs to consumer behavior to
understanding the demand for products/ services.
Evolution of Analytics:
1. Analytics era 1.0:
The first era is also known as the era of ‘Business
Intelligence’. Analytics 1.0 was a time of real progress in
gaining an objective, deep understanding of important
business phenomena and giving managers the fact-based
comprehension to go beyond intuition when making
decisions.
For the first time, data about production processes, sales,
customer interactions, and more were recorded,
aggregated, and analyzed. Data sets were small enough in
volume and static enough in velocity to be segregated in
warehouses for analysis.
However, readying a data set for inclusion in a warehouse
was difficult. Analysts spent much of their time preparing
data for analysis.
Analytics era 2.0 : Also known as the era of ‘Big Data’.
The analytics 1.0 era lasted until the mid- 2000’s and as
analytics entered the 2.0 phase, the need for powerful
new tools and the opportunity to profit by providing them
quickly became apparent. Companies rushed to build
new capabilities and acquire new customers.
Example: LinkedIn, created numerous data products,
including People You May Know, Jobs You May Be
Interested In, Groups You May Like, Companies You May
Want to Follow, Network Updates, and Skills and
Expertise and to do so, it built a strong infrastructure and
hired smart, productive data scientists.
Innovative technologies of many kinds had to be created,
acquired, and mastered in this era.
Big data could not fit or be analyzed fast enough on a single
server, so it was processed with Hadoop, an open source
software framework for fast batch data processing across
parallel servers.
To deal with relatively unstructured data, companies turned to
a new class of databases known as NoSQL.
Much information was stored and analyzed in public or private
cloud-computing environments.
Machine-learning methods (semi-automated model
development and testing) were used to rapidly generate
models from the fast-moving data.
The competencies/ skills thus required for Analytics 2.0 were
quite different from those needed for 1.0.
The next-generation quantitative analysts were called data
scientists, and they possessed both computational and
analytical skills.
Analytics era 3.0:
Like the first two eras of analytics, this one brings new
challenges and opportunities, both for the companies
that want to compete on analytics and for the vendors
that supply the data and tools with which to do so.
High-performing companies will embed analytics directly
into decision and operational processes, and take
advantage of machine-learning and other technologies
to generate insights in the millions per second rather
than an “insight a week or month.”
Data architectures (i.e., Hadoop) will augment the
traditional approaches removing scale barriers. Analytics
truly becomes the competitive differentiator for
enterprises who capitalize on the possibilities of this new
era (International institute for analytics, 2015).
The pictorial representation of the evolution of Data
Analytics:
The pictorial representation of the evolution of Data Analytics
shows that the concept of Data Analytics started in the early
1980s.
In 1980’s the Data Analytics is used in such a way that only
reporting is used to happen.
That means what is happening with the data being obtained.
After this type of Data Analytic modeling, the Data Analytic is
being moved into the second phase that is with early 1990’s
more of Analysis (Analytics) came into existence.
In this period, it focuses on “why did it happen” to the data.
Then in 2000 onwards, the Monitoring of data happens. The
dashboards and the scoreboards are being used for the same.
With this type of analysis, a clear idea of what’s happening to
the data is being understood.
Then after 2010 onwards, the Prediction with the data
and the data inputs being implemented with.
That means, what will happen with the data is the
main question being asked in the period after 2010.
The different methods of statistics, data mining and
the optimization is being used in this period.
Now we are in the era with the more detailed data
analytics and that is of nature Prescriptive.
In this period we are training our machines to be
smarter and focusing on the computations to happen
with less time and less efforts.
So we can conclude that we are in the period with
more of AI.
Overview of Data Analytics:
1. Descriptive Analytics
2. Diagnostic Analytics
3. Predictive Analytics
4. Prescriptive Analytics
1. Descriptive Analytics
Descriptive Analytics : This is the simplest form of analytics, It
summarizes an organization's existing data to understand what
has happened in the past or is happening currently. It
emphasizes "what is going on in the business”.
Descriptive analytics mines historical data to understand the
relationship between past events and the present conditions of
the organization.
It is one of the most widely used analytical tools favored by
marketing, finance, sales, and operations teams, as it efficiently
looks into past data and provides an analysis of the changes by
comparing patterns and trends.
Descriptive analytics answers the question, “What happened? In
the past”.
It summarizes current business status in the way of
narrative and innovative visualization.
Data visualization is a natural fit for communicating
descriptive analysis because charts, graphs, and maps
can show trends in data—as well as dips and spikes—in
a clear, easily understandable way.
It highlights past trends that lead to valuable insights for
business, but we do not emphasize here "why these
trends happened".
We use Descriptive Analytics when we want to
summarize the story of an organization's performance
(mostly in the form of Dashboards).
It provides us with a comprehensive view by joining
different things together to highlight hidden trends and
insights.
Information extracted from descriptive analytics helps leadership to take
actions to make things better, and now with the help of Big Data
technologies, management sees the real–time progress of various vital
business metrics. Management sees a complete picture by benchmarking
company performance against the past few years and key competitors.
Below are a few examples of knowledge extracted from descriptive
analytics :
More cars come for servicing during monsoon due to water problems so
garage should think about hiring part–time mechanics during monsoon to
cater to the temporary demand.
Men convert credit card transactions into EMI more than women; banks
should target men for EMI promotion as they are more likely to opt for the
promotional campaign.
Internet routers show lots of information packets drop during 4–6 PM due
to high congestion, support team to provide extra bandwidth during this
time slot for seamless customer experience.
The health department observes a recurring hike
in malaria disease in a particular locality every
year during the rainy season; they find water
bodies are open in that area which is causing
mosquito breeding.
For example, in an online learning
course with a discussion board,
descriptive analytics could
determine how many students
participated in the discussion, or
how many times a particular
student posted in the discussion
forum.
Essential Tools used in Descriptive Analytics :
Statistical Summary : It provides statistical descriptions
for a given business metric, e.g. Mean, Median, Standard
Deviation, Percentile, Interquartile range, etc.
Z–Score : Z Score tells us how far (in terms of standard
deviation) is a particular value of x from its mean.
Coefficient of Variance : It is a ratio where we divide
standard deviation with mean.
Interquartile Range : It is an important measure to gauge
the variation in the dataset.
2. Diagnostic Analytics
Diagnostic analytics addresses the next logical question,
“Why did this happen?”
Diagnostic analytics provides "Why did it happen in my
business".
It is a bit advanced where analysts examine data in order
to find reasons for business problems or opportunities.
Ex: In a time series data of sales, diagnostic analytics
would help you understand why the sales have decreased
or increased for a specific year or so.
Eg: Reduction in production because of drop in quality.
Below are a few examples :
A company found that employees are not completing
learning certifications, analyst diagnosed that most of
the employees are stuck at programming assignments,
where programming interface was not supportive/
flexible, and there was no way to get hints/ help to
proceed further.
There was a low hotel check–in feedback score; analysts
diagnosed that front office executive enters customer
details which are not required fields during check–in
itself. Typing speed and system navigation is also very
slow which is resulting in a longer check– in time.
The product return rate was very high during last month,
and it found that out of total return items more than 60%
of products were supplied by two vendors only, where
the vendor provided the wrong specification about
products.
Essential tools used in Descriptive Analytics :
NER is a text analytics technique used for identifying named entities like people,
places, organizations, and events in unstructured text. NER extracts nouns from
the text and determines the values of these nouns.
Use cases of named entity recognition:
• NER is use d to cla ssify ne w s conte nt ba se d on pe ople , pla ce s, a nd
organizations featured in them.
• Search and recommendation engines use NER for information retrieval.
• For large chain companies, NER is used to sort customer service requests and
assign them to a specific city, or outlet.
• Hospitals can use NER to automate the analysis of lab reports.
4. Event extraction
This is a text analytics technique that is an advancement over the named entity
extraction. Event extraction recognizes events mentioned in text content, for example,
mergers, acquisitions, political moves, or important meetings. Event extraction requires
an advanced understanding of the semantics of text content. Advanced algorithms
strive to recognize not only events but the venue, participants, date, and time wherever
applicable. Event extraction is a benef icial technique that has multiple uses across
fields.
Use cases of event extraction:
• Link analysis: This is a technique to understand “who met whom and when” through
event extraction from communication over social media. This is used by law
enforcement agencies to predict possible threats to national security.
4. Event extraction
Text analytics is a sophisticated technique that involves several pre-steps to gather and cleanse
the unstructured text. There are different ways in which text analytics can be performed. This is
an example of a model workflow.
1. Data gathering - Text data is often scattered around the internal databases of an
organization, including in customer chats, emails, product reviews, service tickets and Net
Promoter Score surveys. Users also generate external data in the form of blog posts, news,
reviews, social media posts and web forum discussions. While the internal data is readily
available for analytics, the external data needs to be gathered.
2. Preparation of data - Once the unstructured text data is available, it needs to go through
several preparatory steps before machine learning algorithms can analyze it. In most of the text
analytics software, this step happens automatically. Text preparation includes several
techniques using natural language processing as follows:
a. Tokenization: In this step, the text analysis algorithms break the continuous string
of text data into tokens or smaller units that make up entire words or phrases.
For instance, character tokens could be each individual letter in this word: F-I-S-H.
Or, you can break up by subword tokens: Fish-ing. Tokens represent the basis of all
This step also discards all the unwanted contents of the text, including white
spaces.
d. Lemmatization and stemming: These are two processes used in data preparation to remove the
suffixes and affixes associated with the tokens and retain its dictionary form or lemma.
e. Stopword removal: This is the phase when all the tokens that have frequent occurrence but bear
no value in the text analytics. This includes words such as ‘and’, ‘the’ and ‘a’.
3.Text Analytics
Text analytics - After the preparation of unstructured text data, text analytics techniques
can now be performed to derive insights. There are several techniques used for text
a na ly tics. Prominent a mong them a re text cla ssif ic a tion a nd text extra ction.
Once the text analytics methods are used to process the unstructured data,
the output information can be fed to data visualization systems. The results
can then be visualized in the form of charts, plots, tables, infographics, or
dashboards. This visual data enables businesses to quickly spot trends in the
data and make decisions.
Web Analytics
What is web analytics?
Web analytics is the process of analyzing the behavior of visitors to a website. This
involves tracking, reviewing and reporting data to measure web activity, including the
use of a website and its components, such as webpages, images and videos.
Data collected through web analytics may include traffic sources, referring sites,
page views, paths taken and conversion rates. The compiled data often forms a part
of customer relationship management analytics (CRM analytics) to facilitate and
streamline better business decisions.
Web analytics enables a business to retain customers, attract more visitors and
increase the dollar volume each customer spends.
Analytics can help in the following ways:
• Determine the likelihood that a given customer will repurchase a product after purchasing it in the past.
• Personalize the site to customers who visit it repeatedly.
• Monitor the amount of money individual customers or specific groups of customers spend.
• Observe the geographic regions from which the most and the least customers visit the site and purchase
specific products.
• Predict which products customers are most and least likely to buy in the future.
The objective of web analytics is to serve as a business metric for promoting specif ic products to the
customers who are most likely to buy them and to determine which products a specif ic customer is most
likely to purchase. This can help improve the ratio of revenue to marketing costs.
In addition to these features, web analytics may track the clickthrough and drilldown behavior of
customers within a website, determine the sites from which customers most often arrive, and
communicate with browsers to track and analyze online behavior. The results of web analytics are
provided in the form of tables, charts and graphs.
Web analytics process
• Crazy Egg. Crazy Egg is a tool that tracks where customers click on a
page. This information can help organizations understand how visitors
interact with content and why they leave the site. The tool tracks visitors,
heatmaps and user session recordings.
Skills for Business Analytics
Business analytics refers to the process of extracting insights
from data to make informed decisions regarding a business
question or challenge.
Here are five skills you can develop to improve your understanding
of business analytics.
1. Data Literacy
One of the fundamental skills to build before diving into business
analytics is data literacy. At its most basic, data literacy means
you’re familiar with the language of data, including different types,
sources, and analytical tools and techniques.
Being data literate also means you’re comfortable working with
data in various ways—from evaluating it to manipulating it and
gaining insights.
2. Data Collection
The first step in leveraging analytics to drive business decisions is
to collect a data sample from which conclusions can be drawn.
In some cases, a dataset already exists, and it’s up to the business
analyst to pull relevant information. For example, if you’re
interested in discovering a retail store’s most profitable products,
you might start by pulling historical sales data for transactions
that took place over a specific period.
3. Statistical Analysis
Several statistical methods can be helpful when it comes to
analysis, including:
Hypothesis testing , which is a statistical means of testing an
assumption.
Linear regression analysis, which can be used to evaluate the
relationship between two variables.
Multiple regression analysis, which is used to evaluate the
relationship between three or more variables.
Through these forms of analysis, you can draw insights and
conclusions that answer your business question.
4. Communication
While insights derived from reliable data are key to making
informed business decisions, it’s likely that other stakeholders
need to be involved in the decision-making process. For this
reason, effectively communicating your findings is essential.
Without strong communication skills, the value of your analyses
can go unrealized.
5. Data Visualization
Data visualization goes hand in hand with strong communication,
as it allows you to present findings in an easily digestible format
for those who may not be as data literate as you are.