Professional Documents
Culture Documents
What Are Critical Success Factors?
What Are Critical Success Factors?
Key areas of activity in which favorable results are necessary for a company to obtain its goal.
Industry CSFs
Strategy CSFs
Environmental CSFs
Temporal CSFs
Data cubes are commonly used for easy interpretation of data. It is used to represent data along
with dimensions as some measures of business needs. Each dimension of the cube represents
some attribute of the database. E.g profit per day, month or year.
Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy
and consistency, data integration is checked during data cleaning. Data cleaning can be applied
for a set of records or multiple sets of data which need to be merged.
An extension of data mining can be used for slicing the data the source cube in discovered data
mining. The case table is dimensioned at the time of mining a cube.
A stage of data mining is a logical process for searching large amount information for finding
important data.
Stage 1: Exploration: One will want to explore and prepare data. The goal of the exploration
stage is to find important variables and determine their nature.
Stage 2: pattern identification: Searching for patterns and choosing the one which allows making
best prediction, is the primary action in this stage.
Stage 3: Deployment stage. Until consistent pattern is found in stage 2, which is highly
predictive, this stage cannot be reached. The pattern found in stage 2, can be applied for the
purpose to see whether the desired outcome is achieved or not.
6. What are the different problems that “Data mining” can solve?
Data mining can be used in a variety of fields/industries like marketing of products and services,
AI, government intelligence.
The US FBI uses data mining for screening security and intelligence for identifying illegal and
incriminating e-information distributed over internet.
Deleting data from data warehouse is known as data purging. Usually junk data like rows with
null values or spaces are cleaned up.
A BUS schema is to identify the common dimensions across business processes, like identifying
conforming dimensions. It has conformed dimension and standardized definition of facts.
Non additive facts are facts that cannot be summed up for any dimensions present in fact table.
These columns cannot be added for producing any results.
Conformed fact in a warehouse allows itself to have same name in separate tables. They can be
compared and combined mathematically. Conformed dimensions can be used across multiple
data marts. They have a static structure. Any dimension table that is used by multiple fact tables
can be conformed dimensions.
In real time data-warehousing, the warehouse is updated every time the system performs a
transaction. It reflects the real time business data. This means that when the query is fired in the
warehouse, the state of the business at that time will be returned.
Lookup tables, using the primary key of the target, allow updating of records based on the
lookup condition.
Define slowly changing dimensions (SCD)?
SCD are dimensions whose data changes very slowly. eg: city or an employee.
This dimension will change very slowly. The row of this data in the dimension can be either
replaced completely without any track of old record OR a new row can be inserted, OR the
change can be tracked
A transformer built set of similar cubes is known as cube grouping. They are generally used in
creating smaller cubes that are based on the data in the level of dimension.
A data warehouse can be considered as a storage area where relevant data is stored irrespective
of the source.
Data warehousing merges data from multiple sources into an easy and complete form.
A virtual data warehouse provides a collective view of the completed data. I t can be considered
as a logical data model of the containing metadata
An active data warehouse represents a single state of the business. It considers the analytic
perspectives of customers and suppliers. It helps to deliver the updated data through reports
Data Modeling is a technique used to define and analyze the requirements of data that supports
organization’s business process. In simple terms, it is used for the analysis of data objects in
order to identify the relationships among these data objects in any business.
Data warehousing relates to all aspects of data management starting from the development,
implementation and operation of the data sets. It is a back up of all data relevant to business.(
data store).
Business Intelligence is used to analyze the data from the point of business to measure any
organization’s success.
The factors like sales, profitability, marketing campaign effectiveness, market shares and
operational efficiency etc are analyzed using Business Intelligence tools like Cognos,
Informatica etc.
Snapshot refers to a complete visualization of data at the time of extraction. It occupies less
space and can be used to back up and restore data quickly.
Extracting data from different sources such as flat files, databases or XML data, transforming
this data depending on the application’s needs and load this data into a data warehouse.
Data mining is a method for comparing large amounts of data for the purpose of finding patterns.
It is normally used for models and forecasting.
Data warehousing is the central repository for the data of several business systems in an
enterprise. Data from various resources extracted and organized in the data warehouse
selectively for analysis and accessibility.
Applications that supports and manages transactions which involve high volumes of data are
supported by OLTP system. OLTP is based on client-server architecture and supports
transactions across networks.
Business data analysis and complex calculations on low volumes of data are performed by
OLAP. An insight of data coming from various resources can be gained by a user with the
support of OLAP.
What are cubes?
Analysis service provides a combined view of the data used in OLAP or Data mining
Sequence clustering algorithm collects similar or related paths, sequences of data containing
events.
Time series algorithm can be used to predict continuous values of data. Once the algorithm is
skilled to predict a series of data, it can predict the outcome of other series. E.g. forecast the
profit
What is XMLA?
XMLA stands for XML for Analysis. It is an industry standard for accessing data in analytical
systems, such as OLAP.
A surrogate key is a unique identifier in database either for an entity in the modeled word or an
object in the database. Surrogate key is an internally generated key by the current system and is
invisible to the user. As several objects are available in the database corresponding to surrogate,
surrogate key cannot be utilized as primary key.
A tracking process or collecting status can be performed by using fact less fact tables. It does not
have numeric values that are aggregate.
The granularity is the lowest level of information stored in the fact table. The depth of data level
is known as granularity.
Eg:In date dimension the level could be year, month, quarter, period, week, day of granularity.
A snowflake schema is a more normalized form of a star schema. In a star schema, one fact table
is stored with a number of dimension tables. In a star schema, one dimension table can have
multiple sub dimensions. This means that in a star schema, the dimension table is independent
without any sub dimensions.
View:
• Tail raid data representation is provided by a view to access data from its table.
• Has logical structure cannot occupy space.
Materialized view
Linked cubes are the cubes that are linked in order to make the data remain constant.
4. Input/output bugs:-
Valid values not accepted
Invalid values accepted
5. Calculation bugs:-
Mathematical errors
Final output is wrong
9. H/W bugs:-
Device is not responding to the application
1) Constraint Testing:
In the phase of constraint testing, the test engineers identifies whether the data is mapped from
source to target or not.
The Test Engineer follows the below scenarios in ETL Testing process.
a) NOT NULL
b) UNIQUE
c) Primary Key
d) Foreign key
e) Check
f) Default
g) NULL
2) Source to Target Count Testing:
In the Source to Target data is matched or not. A Tester can check in this view whether it is
ascending order or descending order it doesn’t matter .Only count is required for Tester.
Due to lack of time a tester can follow this type of Testing.
NOTE: To check the order of the columns and source column to target column.
Note:
1) There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates may
arise.
2) Sometimes, a developer can do mistakes while transferring the data from source to target at that
time duplicates may arise.
3) Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool).
14) Retesting:
Re executing the failed test cases after fixing the bug.
Project
Here I am taking emp table as example. For this I will write test scenarios and test cases, that
means we are testing emp table.
Note : Ods information would contain cleansed data only. ie after staging area
Staging Area :-
It comes after the ETL has finished. Staging Area consists of
1.Meta Data .
2.The work area where we apply our complex business rules.
3.Hold the data and do calculations.
In other words we can say that its a temp work area.
The full form of ODS is Operational Data Store.ODS is a layer between the source and target
databases..ODS is used to store the recent data.
Staging layer is also a layer between the source and target databases..Staging layer is used for
cleansing purpose and store the data periodically.
ODS (Operational Data Source) is the first point in the Datawarehouse. Its store the real time
data of daily transactions as the first instance of Date.
Staging Area, is the later part which comes after the ODS. Here the Data is cleansed and
temporarily stored before loaded into the Datawarehouse.
ODS is a Open Data Source where it contains real time data (because we should apply any
changes on real time data right..!) so dump the real time data into ODS called Landing area later
we get the data into staging area here is the place where we do all transformation.