Topic 1 - Data Architecture For BI
Topic 1 - Data Architecture For BI
Northeastern University
Acknowledgments: Material mainly based on BI Guidebook by Rick Sherman and Data Engineering Course on Coursera
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 1
Outline
➢Basic Information
➢Course Syllabus
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 2
Lecturer
• Zheng Zheng, Ph.D.
➢Assistant Teaching Professor
➢INFO 6205, INFO 5100, INFO 5002, DAMG 7370
➢Chief Scientist
➢TCTM Kids IT Education Inc. (NASDAQ: TCTM)
➢Chief Technology Officer (CTO)
➢TechArena Canada Inc. (TCTM Canada branch)
• Education
➢Ph.D., Computer Science, McMaster University, Canada
➢M.Eng, Computer Technology, University of Chinese Academy of Sciences, China
• Email: zh.zheng@northeastern.edu
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 3
Outline
➢Basic Information
➢Course Syllabus
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 4
Course Details
• Course Description
Focuses on designing advanced data architectures supporting structured,
unstructured, and semi-structured data sources; hybrid integration and data
engineering; and analytical uses by casual information consumers, power users,
and data scientists. Technologies include databases (relational, columnar, in-
memory, and NoSQL); hybrid data, application, and cloud integration; data
preparation; data virtualization; descriptive, diagnostic, predictive, and prescriptive
analytics; and on-premise and on-cloud deployments.
• Preconditions
Students should have basic knowledge of database management systems
and basic programming skills (especially in Python). It is assumed that the
students know Python and SQL sufficiently to understand all codes used in
the slides and textbook.
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 5
Learning Objectives (1)
Students should know and understand:
• Data architecture
• Data engineering lifecycle
• Data modeling and dimensional modeling
• Data integration and pipelines
• Data warehousing
• BI dimensional modeling
• BI application, design and development
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 6
Learning Objectives (2)
Students should be able to:
• Collect and analyse requirements of the project
• Select and design the data repository for the data
• Develop the architectural framework
• Design the proper approach for BI data models
• Integrate the data
• Generate advanced analytics with data mining and machine
learning techniques
• Deal with “shadow systems”
• Manage the full data engineering lifecycle
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 7
Resources and Textbooks
Textbook
• Business Intelligence Guidebook: From Data Integration to Analytics, 1st
Edition by Rick Sherman, 2014
• Data Mining Concepts and Techniques, 3rd Edition by Jiawei Han, Micheline
Kamber and Jian Pei, 2011
• Resources
• Data Engineering Course and Machine Learning Course on Coursera
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 8
Course Outline
• Topic 1 – Data Architecture for BI • Topic 5 – Python for Data Engineering
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 9
Evaluation
• One assignments (15%)
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 10
Group Project
• A data architecture for business intelligence
• Requirements Collection
• Data ETL
• Data Storage
• Data Mining
• Data Visualization
• Requirements
• Each group consists of 2-3 students
• Can use any data engineering tools
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 12
Recent Interview Questions
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 13
Outline
➢Basic Information
➢Course Syllabus
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 14
What is BI?
• Wikipedia
• BI comprises the strategies and technologies used by enterprises for the data
analysis and management of business information.
• Tableau
• Business intelligence combines business analytics, data mining, data visualization,
data tools and infrastructure, and best practices to help organizations make more
data-driven decisions.
• Microsoft Power BI
• BI uncovers insights for making strategic decisions. Business intelligence tools analyze
historical and current data and present findings in intuitive visual formats.
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 15
History of BI (1)
• In 1865
• Richard Millar Devens first used the term BI
• In 1958
• Hans Peter Luhn, “A Business Intelligence System”
Automatic system developed to disseminate information
to the various sections of any industrial, scientific,
government organization. This intelligent system will
utilize data. Processing machines for auto-abstracting and
auto-encoding of documents and for creating interest
profits for each of the action points in an organization.
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 16
History of BI (2)
• In 1989
• Howard Dresner
concepts and methods to improve business decision
making by using fact-based support systems
• In the 1990s
• A number of BI vendors started appearing in the market
• 2000 onwards
• BI started being a "self-service”
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 17
Traditional BI vs. Modern BI
Traditional BI
➢ common approach for regular
reporting and answering static
queries.
Cycle of analytics
➢ a cycle of data access,
discovery, exploration, and
information sharing
Modern BI
➢ interactive and approachable.
➢ users could visualize data and
answer their own questions.
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 18
How BI works?
Step 1: Collect and transform
data from multiple sources
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 19
Why BI is Important?
• BI gives organizations the ability to ask questions in plain language and get
answers they can understand.
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 20
BI Applications
In a number of areas, including health care, education, finance, etc.
➢Coca-Cola Bottling Company
➢Automated reporting processes
➢Lowe's Corp
➢Optimize its supply chain, analyze products to identify potential fraud, and
solve problems with collective delivery charges from its stores.
➢Charles Schwab (financial services firm)
➢Bring its branch data into a comprehensive view, to understand performance
metrics and identify areas of opportunity
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 21
Outline
➢Basic Information
➢Course Syllabus
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 22
Data is the New Oil
➢ Accuracy of Data
➢ Accessibility of data
when we need it
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 23
Modern Data Ecosystem
Data Sources ENTERPRISE DATA ENVIRONMENT Users
• Data integrated from disparate sources
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 24
Start With Data Sources
Text, images, videos, social media, IoT devices, etc. Structured Data vs. Unstructured Data
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 25
Enterprise Data Environment
Data Sources Users
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 26
Key Players in the Data Ecosystem
• Data Professionals
• Data Engineers
• Data Analysts
• Data Scientists
• Business Analysts
• Business Intelligence Analysts
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 27
Data Engineer
• Maintain data architectures and make data available for business
operations and analysis
• Responsibilities
• Extract, integrate, and organize data from disparate sources
• Clean, transform, and prepare data
• Design, store, and manage data in data repositories
• Skillsets
• Good knowledge of programming
• Sound knowledge of systems and technology architectures
• In-depth understanding of relational databases and non-relational datastores
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 28
Data Analyst
• Translates data into plain language, for organizations make decisions
• Responsibilities
• Inspect, and clean data for deriving insights
• Identify correlations, find patterns, and apply statistical methods to analyze and
mine data
• Visualize data to interpret and present the findings of data analysis
• Skillsets
• Good knowledge of spreadsheets, writing queries, and using statistical tools to
create charts and dashboards
• Programming skills
• Strong analytical and story-telling skills
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 29
Data Scientist
• Responsibilities
• Analyze data for actionable insights
• Create predictive models using Machine Learning or Deep Learning
techniques
• Skillsets
• Knowledge of Mathematics, Statistics
• Understanding of programming languages, databases, and building data
models
• Domain knowledge
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 30
Business Analyst and BI Analyst
Business Analyst
• Leverage the work of Data Analysts and Data Scientists to look at
possible implications for their business and the actions they need to take
or recommend
BI Analyst
• Focus on the market forces and external influences that shape their
business
• Organize and monitor data on different business functions
• Explore data to extract insights and actionable that improve business
performance
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 31
Data Professionals Summary
• Data Engineers convert raw data into usable data
• Data Scientists use Data Analytics and Data Engineering to predict the
future using data from the past
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 32
What is Data Engineering?
• The field of Data Engineering concerns itself with the mechanics for the
flow and access of data. Its goal is to make quality data available for fact-
finding and data-driven decision making
Data Sources Users
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 33
Field of Data Engineering (1)
Collect source data
• Extract, integrate, and organize data from disparate sources
• Data acquisition from multiple sources
• Data architecture for storing source data
Process data
• Clean, transform, and prepare data to make it usable
• Distributed systems for processing of data
• Pipelines for the extracting, transforming, and loading of data
• Solutions for safeguarding quality, privacy, and security of data
• Performance optimization
• Adhere to compliance guidelines
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 34
Field of Data Engineering (2)
Store data
• Storing data for reliable and easy availability of data
• Data stores for the storage of processed data
• Scalable system
• Ensure data privacy, security, compliance, monitoring, backup, and recovery
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 35
Skillsets Exploration (1)
Technical Skills
• Operating Systems (UNIX, Linux, and Windows)
• administrative tools, system utilities and commands
• Infrastructure components
• virtual machines, networking, application services, and cloud-based services
• Databases and Data Warehouses
• Oracle, MySQL, PostgreSQL, Redis, MongoDB, Cassandra Neo4J, Exadata, DB2 Warehouse, RedShift, etc.
• Data Pipelines
• Apache Beam, AirFlow, DataFlow, etc.
• ETL Tools
• IBM Infosphere Information Server, AWS Glue, Improvado, etc.
• Languages
• Query languages (SQL and SQL-like), Programming languages (Python, R, and Java), Shell and Scripting languages
(Unix/Linux Shell and PowerShell)
• Big Data Processing Tools
• Hadoop, Hive, and Spark
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 36
Skillsets Exploration (2)
Functional Skills
• Convert business requirements into technical specifications
• Work with the complete software development lifecycle
• Ideation, architecture, design, prototyping, testing, deployment, and monitoring
• Understand data’s potential application in business
• Understand risks of poor data management
Soft Skills
• Interpersonal skills, teamwork, collaboration, effective communication
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 37
Outline
➢Basic Information
➢Course Syllabus
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 38
What is Data Architecture?
• Wikipedia
• Data architecture (DA) consist of models, policies, rules, and standards that
govern which data is collected and how it is stored, arranged, integrated, and
put to use in data systems and in organizations.
• IBM
• A data architecture describes how data is managed--from collection through
to transformation, for data and the way it flows through data storage
distribution, and consumption. It sets the blueprint e systems.
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 39
DA in Data Management
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 40
Example: a Generic Analytical DA
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 41
Example: Azure Based Modern DA
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 42
Data Architecture Components
• Data pipelines. A data pipeline is the process in which data is collected, moved, and refined. It
includes data collection, refinement, storage, analysis, and delivery.
• Cloud storage. Not all data architectures leverage cloud storage, but many modern data
architectures use public, private, or hybrid clouds to provide agility.
• Cloud computing. In addition to using cloud for storage, many modern data architectures
make use of cloud computing to analyze and manage data.
• APIs. Modern data architectures use APIs to make it easy to expose and share data.
• AI and ML models. AI and ML are used to automate systems for tasks such as data collection,
labeling, etc. At the same time, modern data architectures can help organizations unlock the
ability to leverage AI and ML at scale.
• Data streaming. Data streaming is flowing data continuously from a source to a destination
for processing and analysis in real-time or near real-time.
• Container orchestration. A container orchestration system such as open-source Kubernetes is
often used to automate software deployment, scaling, and management.
• Real-time analytics. The goal of many modern data architectures is to deliver real-time
analytics, the ability to perform analytics on new data as it arrives in the environment.
Zheng Zheng Ph.D. Designing Advanced Data Architectures for Business Intelligence 43