You are on page 1of 2

Applied – Wisdom - I’d better stop the car.

Context – Knowledge – Traffic light I am headed towards has turned red.


Meaning – Information – South traffic light on corner of y mall has turned red. Raw – Data – Red

By measurement scale-Quantitative – Discrete - Dice, No of people, Continuous – Height, Weight, Temperature.


Qualitative – Nominal, Ordinal, Textual
By origin Structed (Relational database, spreadsheets), Semi – Structed, Unstructured (Images, audio, video
By source Primary – collected directly from the source by the researcher, offering high control and accuracy, Secondary –
Already collected, less control and accuracy.
By time Cross sectional data – Represents a single point in time, like a snapshot, Time series data – Captures data over a
period, allowing analysis of trends and changes

Data have 7v- Volume – data gets generated with high volume. EG Social media platforms like Facebook generate several
petabytes of data daily. Variety – of all forms structured, semi-structured, or unstructured. EG A retail business collects data
in various forms, including transaction records (s), customer feedback (uns), and clickstream data (ss). Velocity – speed in
which data is generate. EG Stock exchanges process millions of transactions and market data in real-time. Veracity – how
much data is good, how much data is bad. The data may not be correct always, at times it can be noise data. EG Sensor data
used for monitoring machinery must be accurate to prevent costly downtime. Value – The usefulness & relevance of the
data. It can be measured by benefits and outcomes that data gives. EG Netflix uses viewing data to make decisions on which
original content to produce Visualization - charts, graphs, maps, dashboards. EG Dashboards used in business intelligence
tools to represent sales data visually. Variability – the unpredictability & inconsistency of data. EG Twitter hashtag trends
can cause sudden spikes in data volume and sentiment.

Business analysis – is a professional discipline focuses on identifying business needs and determining to business problems.
Business analyst – refers to skills, technologies and practices for iterative & investigation of past business performance to
gain insight and drive business planning. It focuses on developing new insights & understanding of business performance
based on data & statistical method. It makes extensive use of Analytical modeling & numerical analysis. Include predictive
modeling & fact-based management to drive decision making. Decision analytics – Supports human decision with visual
analytics that the user models to reflect reasoning. Descriptive– Gain insights from the historical data with reporting,
scorecards, clustering, etc Predictive– Employs predictive modelling using statistical & machine learning techniques.
Prescriptive– Recommends decision using optimization, simulation

Database – It is a ordered collection of information stored & read electronically from a computing system. This structured
information is often retrieved, managed, controlled, and arranged to perform various data processing operations.
RDBMS stands for Relational Database Management System. It’s a type of database management system that stores data in
tables with rows and columns. Each table has a unique key, and these keys can be used to connect data from different
tables. RDBMSs use Structured Query Language (SQL) for accessing and managing the data.

Cross-industry standard process for data mining(extraction of relevant data and storing it. Business Understanding Define
the project objectives from a business perspective. EG: A bank wants to improve its credit scoring system to reduce the risk
of loan defaults. Data Understanding Collect initial data and understand its properties. EG: The bank gathers historical loan
data, including customer demographics, credit history, loan details, and repayment records. Data Preparation Clean and
prepare the data for modeling. EG: The bank cleans the data by handling missing values, outliers, and errors. It also creates
new features that might be predictive of loan default, such as debt-to-income ratio. Modeling Select and apply various
modeling techniques. EG: The bank uses statistical models like logistic regression or machine learning algorithms like
random forests to predict the probability of default. Evaluation Evaluate the models to find the best one. Example: The
bank evaluates models based on their accuracy, precision, recall, and AUC. It selects the model that best balances risk and
profitability. Deployment Deploy the model into the operational system. EG: The chosen model is integrated into the
bank’s loan approval process, providing real-time credit scoring for loan applicants.

Data gathering & storage - ETL is a data integration process used in data warehousing. Extracting data from various source
systems. Transforming the data into a suitable format, which may include cleansing, aggregating, and restructuring.
Loading the transformed data into a target system like a data warehouse. EG: A company might use ETL to gather data
from different sales and marketing systems, transform this data to align with a unified schema, and then load it into a
central data warehouse for analysis and reporting ELT is similar to ETL but with a different order of operations: Extracting
data from source systems. Loading the raw data directly into the target system.Transforming the data as needed within
the target system.EG: A business could extract customer data from various online platforms and load it directly into a
cloud-based data warehouse. Once in the warehouse, data is transformed using SQL queries to prepare it for business
intelligence tools

1
The Knowledge Discovery Process - Data Cleaning Remove noise and inconsistent data. EG: Amazon cleans customer
review data by removing duplicates and correcting misspellings. Data Integration Combine data from different sources. EG:
Amazon integrates data from its various services, like Amazon Prime, Amazon Music, and AWS, into a centralized data
repository. Data Selection Select the relevant data for analysis. EG: Amazon selects customer purchase history and
browsing data for analysis to improve product recommendations. Data Transformation Transform data into a suitable
format for mining. EG: Amazon transforms the selected data into a structured format that can be used by machine learning
algorithms. Data Mining Apply algorithms to extract patterns. EG: Amazon uses association rule mining to discover items
that are frequently bought together. Pattern Evaluation Identify truly interesting patterns. EG: Amazon evaluates the
patterns to find those that can predict future buying behaviour with high confidence. Knowledge Representation Present
the knowledge in an understandable way. Example: Amazon represents the discovered knowledge through its
recommendation system, suggesting products to customers based on their behaviour.

A data mart is a subset of a data warehouse, often oriented to a specific business line or team. It contains data relevant to
a particular group, such as sales, finance, or marketing EG: A marketing team might use a data mart containing only
marketing data to analyse campaign performance without accessing the entire data warehouse. A data lake is a storage
repository that holds a vast amount of raw data in its native format until it is needed. It’s designed to store big data and
allows for the storage of both structured and unstructured data. EG: A company might use a data lake to store all of its
data, including web logs, social media data, and sensor data, for potential big data analytics. A data warehouse is a
centralized repository for all the data that an enterprise’s various business systems collect. It is designed for query and
analysis rather than transaction processing and usually contains historical data derived from transaction data EG: An
enterprise might use a data warehouse to store data from various sources like CRM, ERP, and finance systems for business
intelligence and reporting. OLAP is a category of software that allows users to analyse information from multiple database
systems at the same time. It’s used for complex calculations, trend analysis, and sophisticated data modeling EG: A
financial analyst might use OLAP to perform multidimensional analysis on sales data across different regions and time
periods to identify trends and forecast future performance. An OLAP cube is a data structure that allows fast analysis of
data according to the multiple dimensions that define a business problem. It’s a key feature of OLAP that enables it to
perform complex calculations and analyses quickly. EG: A retail company might use an OLAP cube to analyse sales data
across dimensions like time, product categories, stores, and regions to make informed stocking and promotional decisions.

You might also like