Professional Documents
Culture Documents
Intelligence (BI)
1. The limited scope of data analysis:
3. Dependence on IT staff:
While BI tools are excellent at analysing historical data, they have limited
ability to predict the future. This means that they may not be able to
provide insights into future trends or identify potential problems before
they occur.
BI tools are designed to be objective and unbiased, but they can still be
influenced by human bias. This can be a problem for organizations that
rely on BI to make decisions. To address this limitation, organizations
need to be aware of the potential for bias in their BI solutions. They
should also invest in tools and processes that can help mitigate bias,
such as automated decision-making and diverse data sources.
Moreover, it's essential to note that the limitations of BI can vary from
one organization to another based on their data requirements,
organizational structure, and IT infrastructure. Therefore, organizations
must evaluate their BI tools regularly to identify any limitations and
develop strategies to overcome them.
ETL Process
• ETL (Extract, Transform, Load) is a
process of extracting data from different
data sources; manipulating them according
to business calculations; loading the
modified data into a different data
warehouse.
• ETL function lies at the core of Business
Intelligence Systems.
• With ETL, enterprises can obtain
historical, current, and predictive views of
real business data.
Importance of ETL in Business Intelligence
https://www.mantralabsglobal.com/blog/etl-in-
business-intelligence/
How ETL Works?
https://www.mantralabsglobal.com/blog/etl-in-
business-intelligence/
ETL Business Applications
https://www.mantralabsglobal.com/blog/etl-in-
business-intelligence/
What is ETL?
ETL, which stands for extract, transform and load, is a data integration process that
combines data from multiple data sources into a single, consistent data store that is
loaded into a data warehouse or other target system.
As the databases grew in popularity in the 1970s, ETL was introduced as a process
for integrating and loading data for computation and analysis, eventually becoming
the primary method to process data for data warehousing projects.
ETL provides the foundation for data analytics and machine learning workstreams.
Through a series of business rules, ETL cleanses and organizes data in a way which
addresses specific business intelligence needs, like monthly reporting, but it can also
tackle more advanced analytics, which can improve back-end processes or end
user experiences. ETL is often used by an organization to:
Extract data from legacy systems
Cleanse the data to improve data quality and establish consistency
Load data into a target database
What is ETL?
ETL, which stands for extract, transform and load, is a data integration process that
combines data from multiple data sources into a single, consistent data store that is
loaded into a data warehouse or other target system.
As the databases grew in popularity in the 1970s, ETL was introduced as a process
for integrating and loading data for computation and analysis, eventually becoming
the primary method to process data for data warehousing projects.
ETL provides the foundation for data analytics and machine learning workstreams.
Through a series of business rules, ETL cleanses and organizes data in a way which
addresses specific business intelligence needs, like monthly reporting, but it can also
tackle more advanced analytics, which can improve back-end processes or end
user experiences. ETL is often used by an organization to:
Extract data from legacy systems
Cleanse the data to improve data quality and establish consistency
Load data into a target database
Learn more
Read the 2021 Gartner Magic Quadrant for Data Integration Tools report
The most obvious difference between ETL and ELT is the difference in order of
operations. ELT copies or exports the data from the source locations, but instead of
loading it to a staging area for transformation, it loads the raw data directly to the
target data store to be transformed as needed.
While both processes leverage a variety of data repositories, such as databases,
data warehouses, and data lakes, each process has its advantages and
disadvantages. ELT is particularly useful for high-volume, unstructured datasets as
loading can occur directly from the source. ELT can be more ideal for big data
management since it doesn’t need much upfront planning for data extraction and
storage. The ETL process, on the other hand, requires more definition at the onset.
Specific data points need to be identified for extraction along with any potential
“keys” to integrate across disparate source systems. Even after that work is
completed, the business rules for data transformations need to be constructed. This
work can usually have dependencies on the data requirements for a given type of
data analysis, which will determine the level of summarization that the data needs to
have. While ELT has become increasingly more popular with the adoption of cloud
databases, it has its own disadvantages for being the newer process, meaning that
best practices are still being established.
How ETL works
The easiest way to understand how ETL works is to understand what happens in
each step of the process.
Extract
During data extraction, raw data is copied or exported from source locations to a
staging area. Data management teams can extract data from a variety of data
sources, which can be structured or unstructured. Those sources include but are not
limited to:
SQL or NoSQL servers
CRM and ERP systems
Flat files
Email
Web pages
Transform
In the staging area, the raw data undergoes data processing. Here, the data is
transformed and consolidated for its intended analytical use case. This phase can
involve the following tasks:
Filtering, cleansing, de-duplicating, validating, and authenticating the data.
Performing calculations, translations, or summarizations based on the raw data. This
can include changing row and column headers for consistency, converting
currencies or other units of measurement, editing text strings, and more.
Conducting audits to ensure data quality and compliance
Removing, encrypting, or protecting data governed by industry or governmental
regulators
Formatting the data into tables or joined tables to match the schema of the target data
warehouse.
Load
In this last step, the transformed data is moved from the staging area into a target
data warehouse. Typically, this involves an initial loading of all data, followed by
periodic loading of incremental data changes and, less often, full refreshes to erase
and replace data in the warehouse. For most organizations that use ETL, the
process is automated, well-defined, continuous and batch-driven. Typically, ETL
takes place during off-hours when traffic on the source systems and the data
warehouse is at its lowest.
The benefits and challenges of ETL
ETL solutions improve quality by performing data cleansing prior to loading the data
to a different repository. A time-consuming batch operation, ETL is recommended
more often for creating smaller target data repositories that require less frequent
updating, while other data integration methods—including ELT (extract, load,
transform), change data capture (CDC), and data virtualization—are used to
integrate increasingly larger volumes of data that changes or real-time data streams.
ETL tools
In the past, organizations wrote their own ETL code. There are now many open
source and commercial ETL tools and cloud services to choose from. Typical
capabilities of these products include the following:
Comprehensive automation and ease of use: Leading ETL tools automate
the entire data flow, from data sources to the target data warehouse. Many
tools recommend rules for extracting, transforming and loading the data.
A visual, drag-and-drop interface: This functionality can be used for
specifying rules and data flows.
Support for complex data management: This includes assistance with
complex calculations, data integrations, and string manipulations.
Security and compliance: The best ETL tools encrypt data both in motion and
at rest and are certified compliant with industry or government regulations, like
HIPAA and GDPR.
What is OLTP?
OLTP (online transactional processing) enables the rapid, accurate data
processing behind ATMs and online banking, cash registers and ecommerce,
and scores of other services we interact with each day.
OLTP, or online transactional processing, enables the real-time execution of
OLTP, or online transactional processing, enables the real-time execution of
large numbers of database transactions by large numbers of people, typically
over the internet.
A database transaction is a change, insertion, deletion, or query of data in a
database. OLTP systems (and the database transactions they enable) drive
many of the financial transactions we make every day, including online
banking and ATM transactions, e-commerce and in-store purchases, and
hotel and airline bookings, to name a very few. In each of these cases, the
database transaction also remains as a record of the corresponding financial
transaction. OLTP can also drive non-financial database exchanges, including
password changes and text messages.
In OLTP, the common, defining characteristic of any database transaction is
its atomicity (or indivisibility)—a transaction either succeeds as a whole or
fails (or is canceled). It cannot remain in a pending or intermediate state.
What is OLAP?
OLAP (for online analytical processing) is software for performing multidimensional
analysis at high speeds on large volumes of data from a data warehouse, data mart,
or some other unified, centralized data store.
Most business data have multiple dimensions—multiple categories into which the
data are broken down for presentation, tracking, or analysis. For example, sales
figures might have several dimensions related to location (region, country,
state/province, store), time (year, month, week, day), product (clothing,
men/women/children, brand, type), and more.
But in a data warehouse, data sets are stored in tables, each of which can organize
data into just two of these dimensions at a time. OLAP extracts data from multiple
relational data sets and reorganizes it into a multidimensional format that enables
very fast processing and very insightful analysis.
With time, BI vendors started developing tools that were not as cost-intensive or technically
complicated—and the dashboard emerged. Dashboards still depended on IT staff to get the
data in place, but they made BI somewhat accessible for users to generate reports and run
queries themselves.
Big tech companies such as Microsoft, IBM, Oracle, SAP, and SAS were the vanguards of
this BI era and dominated the “leaders” space in Gartner’s 2010 Magic Quadrant for BI
Platforms (full document available to Gartner clients only).
What is self-service BI?
Self-service BI is the approach of implementing data analytics that enables users to access
and use data without statistical, analytical, or data handling expertise.
This approach depends on BI tools that allow users to filter, sort, analyse, and visualize data
to extract insights without the dependence on developers or data or IT specialists.
The idea is simple—grant users direct access to intelligence and help them slice and dice
data as per the need.
Business Intelligence Tools
1. Microsoft Power BI
Power BI is a powerful BI tool that offers both a desktop application (which is free)
and a cloud-based platform for sharing reports and dashboards. Power BI is a full-
featured tool with the ability to transform and visualize your data, along with some
impressive predictive modeling and AI-based features that make this tool a true
leader in the BI market.
Power BI is particularly ideal for beginners who are familiar with Microsoft products,
providing a seamless transition into the world of BI. Get up to speed quickly with
this Introduction to Power BI course by DataCamp. If you wish to take your Power BI
skills to the next level, consider taking the Power BI Fundamentals skill track.
Power BI has two pricing options: a Pro subscription paid per user or a Premium
plan per user or per capacity if you require access to advanced enterprise-scale
features.
Key features:
Data connectors and integrations for a wide range of data sources.
User-friendly interface with a drag-and-drop report builder and powerful data
visualization capabilities.
Natural language query for data exploration.
Web-based nature - Power BI reports can be published to the web and easily
shared with others. In addition, reports can even be designed directly from
Power BI Service on the web with direct integration with some applications.
Power BI also can integrate with Python, bringing the power of data science directly
into your Power BI reports.
2. Tableau
Tableau is another leading BI tool with a strong emphasis on usability, particularly for
non-technical users. Tableau can integrate with dozens of applications easily
through the use of their pre-built data connectors, along with advanced data
discovery and visualization capabilities.
However, a unique aspect of Tabelau is their Tableau Public offering, which is freely
available to anyone, from complete beginners to seasoned experts, to learn more
about Tableau and improve their analytical and BI skills. The Introduction to
Tableau course by DataCamp is a great place to start learning about Tableau. If you
want a more in-depth walkthrough of Tableau, then the Tableau Fundamentals skill
track is an excellent place to start.
Tableau has a subscription-based pricing structure split into three levels, making it
an affordable option even for small teams.
Key features:
Wide range of data connectors for almost every kind of data source.
Interactive, dynamic dashboards that users can either develop and automate or
throw together quick ad-hoc analyses of their data.
Real-time data integration that allows users to visualize and analyze live data.
4. Domo
Domo is an entirely cloud-based BI platform offering easy-to-use data visualization
and reporting capabilities. Domo is also particularly great for beginners thanks to its
user-friendly and intuitive interface, pre-built connectors, and the simplest ETL
solution out of all BI tools.
Domo offers a limited free version to learn more about the tool and try it out. After
that, they offer three pricing tiers depending on your requirements. Domo has based
its pricing structure on credits rather than users. You can have an unlimited number
of users but will need to purchase more credits if your usage causes them to run out.
Key features:
Over 1,000 pre-built connectors for various data sources.
Magic ETL makes data transformation and cleaning a breeze even without
extensive technical expertise.
Advanced embedded analytics with the goal of having “Domo Everywhere.”
5. Zoho Analytics
Zoho Analytics is a user-friendly BI tool that is highly affordable, making it especially
attractive for small businesses and beginners. However, Zoho Analytics does not
compromise on functionality to keep costs low. You can expect many of the same
impressive data preparation and visualization features as the other BI tools on this
list.
Zoho Analytics offers four pricing tiers depending on the number of users you have
and the volume of data you want to import into the platform. If your organization has
large volumes of data or many users, Zoho Analytics can provide a more tailored
plan to you with the option for dedicated servers.
Key features:
Zoho DataPrep offers self-service data preparation and management.
Pre-built dashboards and widgets.
Integration with over 250 data sources and direct connection to more than 50
popular data apps.
‘Ask Zia’ is their natural language query feature that lets users ask questions
about their data in plain English and receive visualized answers.
6. Sisense
Sisense is known for its powerful and incredibly fast data analysis capabilities due to
its unique data architecture that combines data preparation, data modeling, and data
visualization into a single, unified platform. However, the power and sophistication of
the platform should not deter beginners, as Sisense has taken great care in
designing a no-code user experience with an excellent drag-and-drop dashboard
builder.
Sisense offers self-hosted and cloud-based pricing options tailored to your needs.
Key features:
Single-stack architecture for data preparation and analysis.
Sisense’s in-chip data engine is designed for high-speed data processing and
can handle large datasets with ease.
An intuitive drag-and-drop dashboard builder that makes it accessible to a
broader range of users.
AI and machine-learning integration for advanced analytical capabilities.
Features of Tableau
1. Tableau Dashboard
Tableau Dashboards provide a wholesome view of your data by the means
of visualizations, visual objects, text, etc. Dashboards are very informative as
they can present data in the form of stories, enable the addition of multiple
views and objects, provide a variety of layouts and formats, enable the users
to deploy suitable filters. You even have the option to copy a dashboard or its
specific elements from one workbook to another easily.
2. Collaboration and Sharing
Tableau provides convenient options to collaborate with other users and
instantly share data in the form of visualizations, sheets, dashboards, etc. in
real-time. It allows you to securely share data from various data sources
such as on-premise, on-cloud, hybrid, etc. Instant and easy collaboration and
data sharing help in getting quick reviews or feedback on the data leading to a
better overall analysis of it.
3. Live and In-memory Data
Tableau ensures connectivity to both live data sources or data extraction
from external data sources as in-memory data. This gives the user the
flexibility to use data from more than one type of data source without any
restrictions. You can use data directly from the data source by establishing
live data connections or keep that data in-memory by extracting data from a
data source as per their requirement. Tableau provides additional features to
support data connectivity such as automatic extract refreshes, notifying the
user upon a live connection fail, etc.
4. Data Sources in Tableau
Tableau offers a myriad of data source options you can connect to and fetch
data from. Data sources ranging from on-premise files, spreadsheets,
relational databases, non-relational databases, data warehouses, big data, to
on-cloud data are all available on Tableau. One can easily establish a
secure connection to any of the data sources from Tableau and use that
data along with data from other sources to create a combinatorial view of data
in the form of visualizations. Tableau also supports different kinds of data
connectors such as Presto, MemSQL, Google Analytics, Google Sheets,
Cloudera, Hadoop, Amazon Athena, Salesforce, SQL Server, Dropbox and
many more.
5. Advanced Visualizations (Chart Types)
One of the key features of Tableau and the one that got its popularity is its
wide range of visualizations. In Tableau, you can make visualizations as basic
as a:
Bar chart
Pie chart
and as advanced as a:
Histogram
Gantt chart
Bullet chart
Motion chart
Treemap
Boxplot
and many more. You can select and create any kind of visualization easily
by selecting the visualization type from the Show Me tab.
6. Maps
Yet another important feature of Tableau is the map. Tableau has a lot of pre-
installed information on maps such as cities, postal codes, administrative
boundaries, etc. This makes the maps created on Tableau very detailed and
informative. You can add different layers of geology on the map as per your
requirements and create informative maps in Tableau with your data. The
different kinds of maps available in Tableau are Heat map, Flow map,
Choropleth maps, Point distribution map, etc.
7. Robust Security
Tableau takes special care of data and user security. It has a fool-proof
security system based on authentication and permission systems for data
connections and user access. Tableau also gives you the freedom to integrate
with other security protocols such as Active Directory, Kerberos, etc. An
important point to note here is that Tableau practices row-level filtering which
helps in keeping the data secure.
8. Mobile View
Tableau acknowledges the importance of mobile phones in today’s world and
provides a mobile version of the Tableau app. One can create their
dashboards and reports in such a manner that it is also compatible with
mobile. Tableau has the option of creating customized mobile layouts for
your dashboard specific to your mobile device. The customization option gives
the option for adding new phone layouts, interactive offline previews, etc.
Hence, the mobile view gives Tableau users a lot of flexibility and
convenience in handling their data on the go.
9. Ask Data
The Ask data feature of Tableau makes it even more favored by the users
globally. This feature makes playing with data just a matter of simple searches
as we do on Google. You just need to type a query about your data in natural
language and Tableau will present you with the most relevant answers. The
answers are not only in the form of text but also as visuals. For instance, if
what you searched for is already present in a bar graph, the Ask data option
will search and open the bar graph for you instantly. Such features make data
more accessible to users who can easily dig deep into data and find new
insights and patterns.
10. Trend Lines and Predictive Analysis
Another extremely useful feature of Tableau is the use of time series and
forecasting. Easy creation of trend lines and forecasting is possible due to
Tableau’s powerful backend and dynamic front end. You can easily get data
predictions such as a forecast or a trend line by simply selecting some
options and drag-and-drop operations using your concerned fields.