0% found this document useful (0 votes)
61 views43 pages

Topic 1 - Data Architecture For BI

The document outlines the course DAMG 7370 - Advanced Data Architectures for Business Intelligence, taught by Dr. Zheng Zheng at Northeastern University, focusing on designing data architectures for various data types and analytical uses. It covers topics such as data engineering, BI applications, and the roles of data professionals, with an emphasis on practical skills and project-based learning. The course also discusses the importance of business intelligence in decision-making and the evolution of BI from traditional to modern approaches.

Uploaded by

Jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views43 pages

Topic 1 - Data Architecture For BI

The document outlines the course DAMG 7370 - Advanced Data Architectures for Business Intelligence, taught by Dr. Zheng Zheng at Northeastern University, focusing on designing data architectures for various data types and analytical uses. It covers topics such as data engineering, BI applications, and the roles of data professionals, with an emphasis on practical skills and project-based learning. The course also discusses the importance of business intelligence in decision-making and the evolution of BI from traditional to modern approaches.

Uploaded by

Jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

DAMG 7370 - Advanced Data Architectures for Business Intelligence

Topic 1 - Data Architecture for BI

Zheng Zheng, Ph.D.

Northeastern University

Acknowledgments: Material mainly based on BI Guidebook by Rick Sherman and Data Engineering Course on Coursera

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 1
Outline
➢Basic Information

➢Course Syllabus

➢Business Intelligence (BI)

➢Data Engineering Ecosystem

➢Architecture High-level Overview

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 2
Lecturer
• Zheng Zheng, Ph.D.
➢Assistant Teaching Professor
➢INFO 6205, INFO 5100, INFO 5002, DAMG 7370
➢Chief Scientist
➢TCTM Kids IT Education Inc. (NASDAQ: TCTM)
➢Chief Technology Officer (CTO)
➢TechArena Canada Inc. (TCTM Canada branch)

• Education
➢Ph.D., Computer Science, McMaster University, Canada
➢M.Eng, Computer Technology, University of Chinese Academy of Sciences, China

• Email: zh.zheng@northeastern.edu
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 3
Outline
➢Basic Information

➢Course Syllabus

➢Business Intelligence (BI)

➢Data Engineering Ecosystem

➢Architecture High-level Overview

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 4
Course Details
• Course Description
Focuses on designing advanced data architectures supporting structured,
unstructured, and semi-structured data sources; hybrid integration and data
engineering; and analytical uses by casual information consumers, power users,
and data scientists. Technologies include databases (relational, columnar, in-
memory, and NoSQL); hybrid data, application, and cloud integration; data
preparation; data virtualization; descriptive, diagnostic, predictive, and prescriptive
analytics; and on-premise and on-cloud deployments.
• Preconditions
Students should have basic knowledge of database management systems
and basic programming skills (especially in Python). It is assumed that the
students know Python and SQL sufficiently to understand all codes used in
the slides and textbook.

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 5
Learning Objectives (1)
Students should know and understand:
• Data architecture
• Data engineering lifecycle
• Data modeling and dimensional modeling
• Data integration and pipelines
• Data warehousing
• BI dimensional modeling
• BI application, design and development

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 6
Learning Objectives (2)
Students should be able to:
• Collect and analyse requirements of the project
• Select and design the data repository for the data
• Develop the architectural framework
• Design the proper approach for BI data models
• Integrate the data
• Generate advanced analytics with data mining and machine
learning techniques
• Deal with “shadow systems”
• Manage the full data engineering lifecycle

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 7
Resources and Textbooks
Textbook
• Business Intelligence Guidebook: From Data Integration to Analytics, 1st
Edition by Rick Sherman, 2014
• Data Mining Concepts and Techniques, 3rd Edition by Jiawei Han, Micheline
Kamber and Jian Pei, 2011

• Resources
• Data Engineering Course and Machine Learning Course on Coursera

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 8
Course Outline
• Topic 1 – Data Architecture for BI • Topic 5 – Python for Data Engineering

• Topic 2 – Getting to Know The Data • Topic 6 – Data Integration

• Topic 3 – Data Engineering Lifecycle • Topic 7 – Data Warehousing

• Topic 4 – Data Modeling and • Topic 8 – BI Design


Dimensional Modeling

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 9
Evaluation
• One assignments (15%)

• One midterm (25%)

• One group project (60%)


• 10% proposal
• 10% presentation
• 40% design

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 10
Group Project
• A data architecture for business intelligence
• Requirements Collection
• Data ETL
• Data Storage
• Data Mining
• Data Visualization

• Requirements
• Each group consists of 2-3 students
• Can use any data engineering tools

• Submissions (Notion or Overleaf)


• Project Proposal (≤ 2 page)
• Final Project Report (≤ 10 pages)
• Project Presentation
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 11
Teaching & Learning With Generative AI
• Utilizing ChatGPT or other AI tools is becoming more common. While I would
prefer you not use these tools and instead commit to the productive struggle that
is learning, I recognize that these tools are not going away. Rather than ban them,
we will treat them similarly to other resources you use. This means you MUST
follow the four points below:
• Give notice that you used the AI tool, which one you used and how you used
it in the comments of your code.
• Rigorously test and alter the program to suit the assignment and your
understanding.
• You must understand any code you submit and be prepared to explain it to
me.
• All comments should be your own words. Sample code with the appropriate
credit statement will be shown in class.

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 12
Recent Interview Questions

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 13
Outline
➢Basic Information

➢Course Syllabus

➢Business Intelligence (BI)

➢Data Engineering Ecosystem

➢Architecture High-level Overview

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 14
What is BI?
• Wikipedia
• BI comprises the strategies and technologies used by enterprises for the data
analysis and management of business information.

• Tableau
• Business intelligence combines business analytics, data mining, data visualization,
data tools and infrastructure, and best practices to help organizations make more
data-driven decisions.

• Microsoft Power BI
• BI uncovers insights for making strategic decisions. Business intelligence tools analyze
historical and current data and present findings in intuitive visual formats.

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 15
History of BI (1)
• In 1865
• Richard Millar Devens first used the term BI

• In 1958
• Hans Peter Luhn, “A Business Intelligence System”
Automatic system developed to disseminate information
to the various sections of any industrial, scientific,
government organization. This intelligent system will
utilize data. Processing machines for auto-abstracting and
auto-encoding of documents and for creating interest
profits for each of the action points in an organization.

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 16
History of BI (2)
• In 1989
• Howard Dresner
concepts and methods to improve business decision
making by using fact-based support systems

• In the 1990s
• A number of BI vendors started appearing in the market

• 2000 onwards
• BI started being a "self-service”

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 17
Traditional BI vs. Modern BI
Traditional BI
➢ common approach for regular
reporting and answering static
queries.

Cycle of analytics
➢ a cycle of data access,
discovery, exploration, and
information sharing

Modern BI
➢ interactive and approachable.
➢ users could visualize data and
answer their own questions.

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 18
How BI works?
Step 1: Collect and transform
data from multiple sources

Step 2: Uncover trends and


inconsistencies

Step 3: Use data visualization to


present findings

Step 4: Take actions on insights


in real time

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 19
Why BI is Important?
• BI gives organizations the ability to ask questions in plain language and get
answers they can understand.

• BI helps organizations become data-driven enterprises, improve


performance and gain competitive advantage.

• Some of the top business intelligence benefits include:


• Data clarity
• Increased efficiency
• Better customer experience
• Improved employee satisfaction
• ……

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 20
BI Applications
In a number of areas, including health care, education, finance, etc.
➢Coca-Cola Bottling Company
➢Automated reporting processes
➢Lowe's Corp
➢Optimize its supply chain, analyze products to identify potential fraud, and
solve problems with collective delivery charges from its stores.
➢Charles Schwab (financial services firm)
➢Bring its branch data into a comprehensive view, to understand performance
metrics and identify areas of opportunity

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 21
Outline
➢Basic Information

➢Course Syllabus

➢Business Intelligence (BI)

➢Data Engineering Ecosystem

➢Architecture High-level Overview

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 22
Data is the New Oil

➢ Accuracy of Data

➢ Accessibility of data
when we need it

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 23
Modern Data Ecosystem
Data Sources ENTERPRISE DATA ENVIRONMENT Users
• Data integrated from disparate sources

• Different types of analysis and skills to


generate insights

• Active stakeholders to collaborate and


act on insights generated

• Tools, applications, and infrastructure


to store, process, and disseminate data
as required

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 24
Start With Data Sources
Text, images, videos, social media, IoT devices, etc. Structured Data vs. Unstructured Data

Reliability, security, and integrity of the data

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 25
Enterprise Data Environment
Data Sources Users

Raw data needs to get organized, cleaned up, and


optimized for access, and conform to compliances
and standards enforced in the organization.
Data Management
Repositories that provide high availability,
flexibility, accessibility, and security

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 26
Key Players in the Data Ecosystem
• Data Professionals
• Data Engineers
• Data Analysts
• Data Scientists
• Business Analysts
• Business Intelligence Analysts

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 27
Data Engineer
• Maintain data architectures and make data available for business
operations and analysis

• Responsibilities
• Extract, integrate, and organize data from disparate sources
• Clean, transform, and prepare data
• Design, store, and manage data in data repositories

• Skillsets
• Good knowledge of programming
• Sound knowledge of systems and technology architectures
• In-depth understanding of relational databases and non-relational datastores

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 28
Data Analyst
• Translates data into plain language, for organizations make decisions

• Responsibilities
• Inspect, and clean data for deriving insights
• Identify correlations, find patterns, and apply statistical methods to analyze and
mine data
• Visualize data to interpret and present the findings of data analysis

• Skillsets
• Good knowledge of spreadsheets, writing queries, and using statistical tools to
create charts and dashboards
• Programming skills
• Strong analytical and story-telling skills

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 29
Data Scientist
• Responsibilities
• Analyze data for actionable insights
• Create predictive models using Machine Learning or Deep Learning
techniques

• Skillsets
• Knowledge of Mathematics, Statistics
• Understanding of programming languages, databases, and building data
models
• Domain knowledge

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 30
Business Analyst and BI Analyst
Business Analyst
• Leverage the work of Data Analysts and Data Scientists to look at
possible implications for their business and the actions they need to take
or recommend

BI Analyst
• Focus on the market forces and external influences that shape their
business
• Organize and monitor data on different business functions
• Explore data to extract insights and actionable that improve business
performance
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 31
Data Professionals Summary
• Data Engineers convert raw data into usable data

• Data Analysts use this data to generate insights

• Data Scientists use Data Analytics and Data Engineering to predict the
future using data from the past

• Business Analysts and BI Analysts use these insights and


predictions to drive decisions that benefit and grow their business

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 32
What is Data Engineering?
• The field of Data Engineering concerns itself with the mechanics for the
flow and access of data. Its goal is to make quality data available for fact-
finding and data-driven decision making
Data Sources Users

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 33
Field of Data Engineering (1)
Collect source data
• Extract, integrate, and organize data from disparate sources
• Data acquisition from multiple sources
• Data architecture for storing source data

Process data
• Clean, transform, and prepare data to make it usable
• Distributed systems for processing of data
• Pipelines for the extracting, transforming, and loading of data
• Solutions for safeguarding quality, privacy, and security of data
• Performance optimization
• Adhere to compliance guidelines

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 34
Field of Data Engineering (2)
Store data
• Storing data for reliable and easy availability of data
• Data stores for the storage of processed data
• Scalable system
• Ensure data privacy, security, compliance, monitoring, backup, and recovery

Make data available


Data engineering is a
• Making data available to users securely team sport!
• APIs, services, and programs for retrieving data for end-users
• User access through interfaces and dashboards
• Checks and balances to ensure data secure
No one is Superman!

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 35
Skillsets Exploration (1)
Technical Skills
• Operating Systems (UNIX, Linux, and Windows)
• administrative tools, system utilities and commands
• Infrastructure components
• virtual machines, networking, application services, and cloud-based services
• Databases and Data Warehouses
• Oracle, MySQL, PostgreSQL, Redis, MongoDB, Cassandra Neo4J, Exadata, DB2 Warehouse, RedShift, etc.
• Data Pipelines
• Apache Beam, AirFlow, DataFlow, etc.
• ETL Tools
• IBM Infosphere Information Server, AWS Glue, Improvado, etc.
• Languages
• Query languages (SQL and SQL-like), Programming languages (Python, R, and Java), Shell and Scripting languages
(Unix/Linux Shell and PowerShell)
• Big Data Processing Tools
• Hadoop, Hive, and Spark

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 36
Skillsets Exploration (2)
Functional Skills
• Convert business requirements into technical specifications
• Work with the complete software development lifecycle
• Ideation, architecture, design, prototyping, testing, deployment, and monitoring
• Understand data’s potential application in business
• Understand risks of poor data management

Soft Skills
• Interpersonal skills, teamwork, collaboration, effective communication
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 37
Outline
➢Basic Information

➢Course Syllabus

➢Business Intelligence (BI)

➢Data Engineering Ecosystem

➢Architecture High-level Overview

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 38
What is Data Architecture?
• Wikipedia
• Data architecture (DA) consist of models, policies, rules, and standards that
govern which data is collected and how it is stored, arranged, integrated, and
put to use in data systems and in organizations.

• IBM
• A data architecture describes how data is managed--from collection through
to transformation, for data and the way it flows through data storage
distribution, and consumption. It sets the blueprint e systems.

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 39
DA in Data Management

In terms of data management


subject areas, Data Architecture
shows how each subject area fits
into overall data management
framework

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 40
Example: a Generic Analytical DA

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 41
Example: Azure Based Modern DA

Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 42
Data Architecture Components
• Data pipelines. A data pipeline is the process in which data is collected, moved, and refined. It
includes data collection, refinement, storage, analysis, and delivery.
• Cloud storage. Not all data architectures leverage cloud storage, but many modern data
architectures use public, private, or hybrid clouds to provide agility.
• Cloud computing. In addition to using cloud for storage, many modern data architectures
make use of cloud computing to analyze and manage data.
• APIs. Modern data architectures use APIs to make it easy to expose and share data.
• AI and ML models. AI and ML are used to automate systems for tasks such as data collection,
labeling, etc. At the same time, modern data architectures can help organizations unlock the
ability to leverage AI and ML at scale.
• Data streaming. Data streaming is flowing data continuously from a source to a destination
for processing and analysis in real-time or near real-time.
• Container orchestration. A container orchestration system such as open-source Kubernetes is
often used to automate software deployment, scaling, and management.
• Real-time analytics. The goal of many modern data architectures is to deliver real-time
analytics, the ability to perform analytics on new data as it arrives in the environment.
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 43

You might also like