You are on page 1of 70

ETL-BI TESTING

We have divided an application into two categories.


1. Transactional Application
2. Analytical Application

1. Transactinal applications:

It is responssible to control and run the fundamental business process.


Most of the applications are transactional application.
Ex. Amazon, phonepay, facebook
What is business process?
Business process of amazon:
 Create user account
 User login
 Search product
 Addto cart/buy now
 Delivery address
 Payment details
 Order placed
Amazon will follow above business process for generating money for it.

2. analytical applications:
it is responsible for analysis amd review to unserstand or improve any product, any
process, any business.
it generate the report according to previous data .
Ex. Suppose I have a airtel plan of 598 rs. In this plan I got 84 days unlimited calls and 1.5gb
data per day. Suppose I will recharge my phone through phonepay app then I will get 5%
discount on this 598rs plan.so that of company will sell this plan with 5% discount then 7% of
customer are increased. Also if company will sell this plan with 10% discount then 10-15%
customer are increased. According to this type of analysis company will improve their business
process.
Only the business team uses analytical application
Ex. call logs analytical (Worldometers.info/corona)
Q…..Why we need datawarehouse??
1. Performance may be hampered if the reports are created from transactional
application database:
While using any application performance is the primary concern. So, it is not suggested
to create the report and running the fundamental business process from one database. It
may hamper the performance of the system. To overcome this problem, we use
another database which is also called as Data warehouse/analytical application
database. This database is totally dedicated for reporting purpose.

The data from transactional application database is copied to another database i.e. Data
warehouse. Analysis workload and transactional workload is separated in two different
databases. Different database have different data. Server is nothing but the database
2. Development and design is time consuming and difficult when we have multiple
data stores

If we have data in the multiple places, creating the report is bit difficult and time
consuming. The standard and good practice is to create a central repository means a data
warehouse. Store all the data into that data warehouse. Now we have all the data at the
single place and creating the reports from that place is not that difficult. Data can be
stored in different database. Creating the reports from different database or multiple
data sources is a bit difficult and development and design is time consuming.
3. Data is not fit for analysis

We are not changing the meaning of data; we are just changing the format of the data. We store
the data in the single uniform format while storing the date in data warehouse. Means we convert
these data into the single uniform format before storing it into the data warehouse. We
manipulate the data according to our need by making it in one uniform format we want. The
conversion of the data is called data transformation. For data transformation, data conversion of
manipulating the data, we need data warehouse. Data warehouse stores the data in one single
uniform format and it will be displayed in the report.

Q…What is Data warehouse?


1. It is a database designed for querying and analysis rather than for transaction processing.
2. It separates analysis workload from transaction system.
3. This helps in:
i. Maintaining historical records
ii. Analyzing the data to gain a better understanding of the business and to improve the
business.
4. Data warehouse is a subject-oriented, integrated, time varying, non-volatile collection of data
for decision-making process.
Data warehouse is a copy of transactional application database
Q…What are the characteristics of Data warehouse?
Data warehouse has subject-oriented, integrated, time varying, non-volatile collection of data for
decision-making process.
Subject Oriented: This is used to analyze particular subject area.
Integrated: This shows that datawarehouse will accepts data from different sources.
Time variant: Historical data is usually maintained in a Data warehouse, i.e. retrieval can be
for any period. In transactional system only the most recent/current data is maintained. But in the
Data warehouse recent/current and the previous/historical data is maintained.
Non-Volatile: Once the data is placed in the data warehouse, it cannot be changed, which
means we will never be able to change the data.

Q…can we create reports directly from transactional application database??


Yes, but 1.performance of the system may hampered
2. development and design is time consuming
3. data is not fit for analysis

Transactional application database is also called sourse or OLTP


Datawarehouse is also called as target or OLAP

Q….what is architecture /technical flow /data flow/ high level design of data warehouse??
There are four layers in DWH architecture:
1.Data Source Layer
2.Data Staging Area
3. Data Storage Layer
4. Reporting Layer
Data Source layer: which refers to various data stores in multiple formats like relational
database, Flat Files, Excel files, Xml Files etc. These data stores business data like Sales,
Customer, Finance, Product etc.
After that the next step is Extract, where the required data from data source layer is extracted and
put into the data staging area.
Data Staging area: it is intermediate layer between Data Source Layer and Data Storage Layer
used for processing data during the ETL process. It minimises the chances of data loss.
staging area basically used to hold the data and to perform data transformations, before
loading the data into data warehouse.
Actual transformation of transactional data into analytical data is done in data staging
area.
Data Storage layer: i.e. data warehouse, the place where the successfully cleaned, integrated,
transformed and ordered data is stored in a multi-dimensional environment.
Now, the data is available for analysis and query purposes.
reporting layer: in this layer data in data storage layer is used to create various type of
management reports from where user can take business decisions for planning, designing,
forecasting etc.
Meta Data Repository :Meta data is nothing but the data about data. Meta data repository is use
to store meta data of data which is actually present in data warehouse i.e. Data storage layer.
Meta data repository is work like an index.
Data Mart: Data mart can be defined as the subset of data warehouse. A data mart is focused on
a single functional area e.g. product, customers, employees, sales, payment etc. It is a subject-
oriented databases.

Q…What do you mean by ODS (Operational data store)?


- Between the staging area and the Data Warehouse, ODS serves as a repository for data.
Upon ODS will load all the data into the DW (data warehouse).
- The benefits of ODS mainly applicable to business operations, as it presents current,
clean data from multiple sources in one place. Unlike other databases, an ODS database
is read-only, and customers cannot update it.

Q…Explain data mart.


- data warehouse can be divided into subsets, also called data marts, which are focused on
a particular business unit or department. Data marts allow selected groups of users to
easily access specific data without having to search through an entire data warehouse.
- In contrast to data warehouses, each data mart has a unique set of end users, and building
a data mart takes less time and costs less, so it is more suitable for small businesses.
- There is no duplicate (or unused) data in a data mart, and the data is updated on a regular
basis.
Q…what is ETL?
1. ETL is considered an important component of data warehousing architecture.
2. ETL stands for Extract-Transform-Load.
Extract is the process of reading data from a source database/ transactional system.
Transform is the process of converting the extracted data to required form.
Load is the process of writing the data into the target database/ analytical system.
3. It is a process which defines how data is loaded from the source system to the target system
(data warehouse).

Q…Explain the three-layer architecture of an ETL cycle.


Typically, ETL tool-based data warehouses use staging areas, data integration layers, and access
layers to accomplish their work. In general, the architecture has three layers as shown below:

- Staging Layer: In a staging layer, or source layer, data is stored that is extracted from
multiple data sources.
- Data Integration Layer: The integration layer plays the role of transforming data from the
staging layer to the database layer.
- Access Layer: Also called a dimension layer, it allows users to retrieve data for
analytical reporting and information retrieval.

OLTP (Online Transaction Processing System):


1. OLTP is nothing but a database which actually stores the daily transactions which are created
from one and more applications.
2. Data in OLTP is called as the current data.
3. Mostly normalized data is used in OLTP system.
OLAP (Online Analytical Processing System) :
1. OLAP is use to store analytical data
2. It deals with analyzing the data for decision making and planning, designing etc.
3. Data in OLAP is called as the Historical data.
4. Mostly Denormalized data is used in OLAP system

Q…difference between OLTP & OLAP / transactional system VS analytical system


OLTP OLAP
1 Online transactional processing system Online analytical processing system
2. It is the original source of data Data comes from various OLTP system

3. It holds current data It holds current and historical data

It stores all business data It stores data relevant for reporting.

It has small databases It has large databases

It contains volatile data.( Create, Read, It contains non-volatile data. (Read)


Update, Delete)

4. Data in normalized form Data in de-normalized form


5. To control and run fundamental business It deals with analyzing the data for
tasks. decision making and planning &
designing

6. It has many users. It has few users.


7. The processing time of transaction is less The processing time of transaction is
more
8. Query complex Query simple

Q…Can you define cubes and OLAP cubes?


- Cubes are data processing units comprised of fact tables and dimensions from the data
warehouse. It provides multi-dimensional analysis.
- OLAP stands for Online Analytics Processing, and OLAP cube stores large data in muti-
dimensional form for reporting purposes. It consists of facts called as measures
categorized by dimensions.
full dependency:
we can not store the data in different different table

partial dependency:
we can store the data in different different table

Transistive dependancy: A > B > C


A > B, B > C Hence A > C

Redundancy:
Saving one type of data again and again
Partial dependency will create redundancy.

Q…What is Normalization ?
Normalization is the process of efficiently organizing the data in the database. It is done by
databse architect.
Normalization is used to minimize the redundancy. It is also used to eliminate the undesirable
characteristics like Insertion, Updation and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using relationship.
Ex. following table is in denormalised form

Above table normalised form is,


Rules of normalization: 1NF, 2NF, 3NF, BCNF(boyce-codd normal form), 4NF

1st Normal Form (1NF) : It states that an attribute of a table cannot hold multiple values. It
must hold only single-valued attribute.

Second Normal Form (2NF):


For a table to be in the Second Normal Form It should be in the First Normal form And, it
should not have Partial Dependency
Third Normal Form (3NF):
A table is said to be in the Third Normal Form when, It is in the Second Normal form And, it
doesn't have Transitive Dependency

Q…..what is normalized data and denormalization data?


Normalized data: normalization is the technique of dividing data into multiple table to reduce
data redundancy and data integrity
Denormalization data: denormalization is the technique of combining the data into single
table

Q…have you ever seen De-normalized data and its normalized data??
yes ,
De-normalized data:
its normalized data:

Q… difference between normalized and de-normalized data


Normalized data de-normalized data
1. normalization is the technique of dividing data denormalization is the technique of
into multiple table combining the data into single table
2 Design complex Design simple
3. Performance of the system increases Performance of the system degraded
4. Memory consumption is less Memory consumption is more
5. Development more time consuming Development less time consuming
6. To reduce data redundancy(duplication) To achieve faster execution of
queries through introducing
redundancy

7. Query complex Query simple


Select colour.colorname from car_colour
join car on car.colour_id = colour.colour_id Select color from car_colour
where car.car_id = 102; where car_id = 102;

OLTP OLAP
Schema: logical partition in database
Default schema name: dbo

Q…What do you mean by schema objects?


Generally, a schema comprises a set of database objects such as tables, views, indexes, clusters,
database links, and synonyms, procedures, packages etc.
Schema object is a logical description or structure of the database.
Schema objects can be arranged in various ways in schema models designed for data
warehousing. Star and snowflake schemas are two examples of data warehouse schema models.

Data models :
- Data model tells how the logical structure of a database is designed.
- Data models define how data is connected to each other and how it will be processed and stored
inside the system.
- Types of Data Models:
i. Conceptual Data Model
ii. Logical Data Model
iii. Physical Data Model

Conceptual Data Model ( made by Architect) :


A conceptual data model is high level design of database.
Features of conceptual data model include:
1. Displays the important entities and the relationships among them.
2. No attribute is specified.
3. No primary key is specified.

Logical Data Model ( made by Architect) :


Logical Data Model defines the data as much as possible, to show how they can be physically
implemented in the database.
i. Includes all entities and relationships among them.
ii. All attributes/columns for each entity/table are specified.
iii. The primary key for each entity is specified.
iv. Foreign keys (keys identifying the relationship between different entities) are specified.
v. Constraints are defined. (Unique, Not null, Check, default etc..)

Physical Data Model (made by Developer) :


Actual implementation of logical model into Database is called Physical Data Model
Q… difference between relational data model and dimensional data model
Seq. no. Relational data modelling Dimensional data modelling
1. Suitable for OLTP application Suitable for OLAP applications

2. Many no of tables with chain relation Fixed structured , central table is


among them called fact table and surrounded by
dimension table

3. Create/read/update/delete operations Select(read) operation is perform


are perform

4. Normalization is suggested De-normalization is suggested

Q…difference between logical data model vs physical data model


Seq.no. Logical data model Physical data model
1. Include entities , attributes and their Include tables, colums, keys, data
relationships types, database triggere, procedures
and access control

2. Data architect and business analyst create Database administrator and developer
logical data model create physical data model

3. The objective of logical data model is to The objective of physical data model is
technical map of rules and data structure to implement the actual database

4. Simpler than physical data model Complex than logical data model

Q…what is fact(measures) and dimension??


fact is counted or measured event
Order id Product Date address quantity total price
ON- Laptop 24-03-2022 pune 9 120000
45673
Here quantity and total price are fact. And remaining are dimensions
- fact is always a number but every no is not a fact
- the numbers which are not countable (ex.aadhar card no, pan card no, mob no.)
Dimension: discriptive information about fact is called dimension.

Q…what is fact table and dimension table?


fact table consist facts or measures of business process
- it is the central table surrounded by dimension table
- fact table consist two types of column
1.column that contain facts
2.foreign keys to dimension table
dimension table is used for storing dimension. dimension table also contain primary key.

Q…What do you mean by factless table?


- Factless tables do not contain any facts or measures. It contains only dimensional keys
and deals with event occurrences at the informational level but not at the calculational
level.
- As the name implies, factless fact tables capture relationships between dimensions but
lack any numerical or textual data.
- Factual fact tables can be categorized into two categories: one that describes events, and
the other one that describes conditions. Both may have a significant impact on your
dimensional modeling.

Q…Define Grain of Fact.


Accordingly, grain fact refers to the level of storing fact information. Alternatively, it is known
as Fact Granularity.

Q…what are the types of dimension table?


1.Star schema
2.Snowflak schema
3.Galaxy schema

1..star schema:
 It is simple form of dimension table.
 In star schema design, central table is called fact table and radially connected table are
called dimension table.
 It is known as star schema because entity realationship diagram is look like a star.
 Dimension table in star schema are in de-normalized form.
 Star schema is good for data marts with simple relationship.
In the following Star Schema example, the fact table is at the center which contains keys to every
dimension table like Dealer_ID, Model ID, Date_ID, Product_ID, Branch_ID & other attributes
like Units sold and revenue.
2..Snowflak schema:
 The process of normalizing the dimension table is called snowflaking.
 ER diagram of this schema is look like a snowflak so that it is called snowflak
schema. Snowflak schema is extension of star schema.
 Dimension table are in normalized form.
 Advantages
1. Data integrity is reduced because of structured data.
2. Data are highly structured, so it requires little disk space.
3. Updating or maintaining Snowflaking tables is easy.

 Disadvantages
4. Snowflake reduces the space consumed by dimension tables, but the space
saved is usually insignificant compared with the entire data warehouse.
5. Due to the number of tables added, you may need complex joins to
perform a query, which will reduce query performance

Q… difference between star schema and snowflak schema


Seq. Star schema Snowflak schema
no.
1. When dimension table contain less When dimension table contains more no of
no of rows thwn we choose star rows then we choose snowflak schema
schema
2. Good for data marts with simple Good to use for data warehouse to simplify
relationship complex relationship

3. Simple DB design Complex DB design

4. Both fact table and dimension table Fact table in de-normalized form but
are in denormalized form dimension table are in normalized form

5. A star schema contain one dimension A snowflak schema contain more than one
table for each dimension. dimension table for each dimension.

6. High level of data redundancy Low level of data redundancy

Q…what is galaxy schema?


 contains two fact table that share dimension tables between them.
 It is also called Fact Constellation Schema.
 The schema is viewed as a collection of stars hence the name Galaxy Schema
 In Galaxy schema shares dimensions are called Conformed Dimensions.
Types of facts:

We can catagories the facts according to , how the nature of facts in table with dimensions in the
table

1…additive facts:

Additive facts are facts that can be summed up with all of the dimensions in the fact table.

2…non additive fact:

Non-additive facts are facts that cannot be summed up with any of the dimensions present in the
fact table. The ratio is an example of a non-additive fact
3…semi additive fact:
Semi-additive facts are facts that can be summed up with some of the dimensions in the
fact table, but not with all..

Q…Types of Dimensions
Dimensions are catagorised according to how the data is stored in data
warehouse/data mart/database.

1. Slowly changing dimensions


2. Conformed dimension
3. Degenerated dimension
4. Junk dimension

1.slowly changing dimension:


Dimensions that changes slowly over a period of time, rather than changing on regular
schedule. A Slowly Changing Dimension (SCD) is a dimension that stores and manages both
current and historical data over time in a data warehouse.
It is considered and implemented as one of the most critical ETL tasks in tracking the history
of dimension records.
There are many approaches how to deal with SCD.
- SCD0 (Type 0 - The passive method )
- SCD1 (Type 1 - Overwriting the old value )
- SCD2 (Type 2 - Creating a new additional record )
- SCD3 (Type 3 - Adding a new column )
- SCD4 (Type 4 - Using historical table )
- SCD6 (Type 6 - Combine approaches of types 1,2,3 (1+2+3=6))

SCD0:
- In type 0, no special action is performed upon dimensional changes.
- Once we entered data into table then it can not be change. If we want to change the data
then it will shows the error. The error will be shown while executing the ETL code.

SCD1:
- Old data is replaced with new one. Do not store history data.
- This type is easy to maintain and is use for data which changes are caused by processing
corrections. (e.g. removal special characters, correcting spelling errors).
SCD2:
- New row is created for new data in same table
- Current data and history data is present in same table.

SCD3:
- Adding a new column.
- current and history data is kept in same table and same row.
- History is limited. It store history but only previous one.
- Storing of history data is depend on column structure maintain in the table
- This is the least commonly needed technique.
SCD4:
- Separate table are there for current data and history data.
- The 'main' table keeps only the New data (current data ) .
- Separate historical table is used to all historical changes for each of the dimension.

SCD6:
- It is combination of types 1,2,3 (1+2+3=6).
- In this type we have additional columns in dimension table such as
Current_Address, Current_Year : for keeping current value of the attribute.
Previous_Address, Previous_Year : for keeping historical value of the attribute.
Current_Flag : for keeping information about the most recent record.
Q…in your current project which SCD are use?
SCD1 & SCD2. But I am aware of other scd also.

Q…why other SCD are not use?


We will use SCD according to the client requirement. Mostly the requirement of client is on
SCD1 and SCD2.

Q…which SCD stored history?


SCD2, SCD3, SCD4, SCD6

Q…which SCD not store history?


SCD0 & SCD1

Q…difference between SCD2 AND SCD3


Seq. no. SCD2 SCD3
1. Current data and history data in Current data and history data is stored
stored in same table in same row same table

2. Here no limitation of storing Here we can store limited history data.


history data Storing of history data is depend on
column structure maintain in the table

3. It is commonly used type of SCD It is rarely used type of SCD

Q…difference between SCD2 and SCD4


Seq. no. SCD2 SCD4
1. Current data and history data in Current data stored in current data
stored in same table table and history data is stored in
history data table

2. Here no limitation of storing Here also no limitation of storing


history data history data

3. It is commonly used type of SCD It is rarely used type of SCD


4. More size required for storing data Less size required for storing data

2..conformed dimension:
Conformed dimension are those dimension which have been designing in such a way that
the dimension can be used across many fact tables(data marts) in different subject area of data
warehouse. That means shared dimensions are called conformed dimension.

3..degenrated dimension:
degenrated dimension are those dimension which are directly present in fact table not in
seprate dimension table.

4..junk dimension:
When group of independent dimension are stored in separate dimension table that
dimension are called junk dimension.
Business key:
Business key uniquely identifies a row in the table.
Business key are good way of avoiding duplicate records.
When we don’t want to have change in a data of the table then we will use business key

Q…Why database table needs Primary Key?


1) For good practice in database designing, we have to maintain primary key in every table.
2) Primary key is use to uniquely identifies each row/record in a table.

Natural key :
1. Primary key is made up of real data in table, that primary key is called as natural primary key.
2. For example, In HR database, Employees table having ‘Employee_id’ column which is unique
and not null and we can call this real data because every employee should be identified by its’
Employee_id.
3. Here we can make Employee_id as primary key in Employees table and this primary key is
called as natural primary key.
Surrogate key:
1.Some times in database table we cannot make primary key from real data.
2. In this situation, we have to add one artificial column in table which is having unique and not
null values, and make this column as primary key in table.
3. This primary key which is generated from artificial column is called as Surrogate key.
Ex.
If we have to maintained history of employee table then Emp_id should be not the primary key
column in table, because there are chances of duplicity in Emp_id column.
To resolve this problem of primary key in employee table, we have to add column EMP_KEY
column which should be unique, not null values. And we can make EMP_KEY as primary key in
employee table, as this primary key is called as surrogate key
30 APRIL
Loading types:
1. Initial load
2. Incremental load
3. Full load
1..initial load: when first time the source data is loaded into the target system that type of load is
called initial load
2..incremental load: newly added data and existing modified data will be loaded into target
system
3..full load: firstly truncate all the data from target table and then reload it into target. It is used
when we have recent records. It is rarely used type of loading
04 may
ETL code:
Extract means reading data from source
After extracting data from the source ETL code is executed for data transformation.
ETL code is executed when source is not busy with other work.
Ex. In banking working hour the etl code is not executed
Whats app backup at night 2pm
Suppose data is coming in source at day1 then it is transferred to target at day2

ETL Job:
For automatic execution of ETL code one code is writing that one code is called ETL job.
ETL job is made by developer.

Difference between ETL CODE and ETL JOB


ETL CODE is responsible for migration of data from source to destination
ETL JOB is responsible for automatic execution of ETL CODE

Tester work related to the ETL job


1. job run is success or not
2. how to see the history of job execution
3. how to run the job manually
4. if job execution is fail then how to send log file to developer

Why automatic execution of ETL code is happened? Or use of ETL job?


etl code is responsible for migration of data from source to target. Etl code is executed when the
source is not busy with other work or it has less work load. For ex. Whats app take back up at
night 2pm. Because at that time less no of users are on whats app , that means source has less
workload. So at night avoiding manual efforts and employees workload automatic execution of
ETL code is happened.

The environment in which the developer is working that environment is called dev environment.
The environment in which the tester the is working that environment is called test environment.
Developer and tester will work on the data that data is called test data.
After that data is going to the end user and that environment is called production / live
environment.
the data made by end user that data is called live data or real data
real data of one company is not given to other company because of security reasons.

05 may
Mapping document: (also called as God’s document/ data leakage document)
- Mapping document is made by architect with the help of BA(business analyst).
- Mapping document contain total information about source and target.
Template of mapping document contain following info,
1.mapping no: for unique identification of each row
2. source table
3. source column
4. data type
5. target table
6. target column
7. data type
8. transformation logic: the logic required for converting data from the format of source
system into the required format of destination system
The data from source is directly transfer into target without any transformation logic is called
direct mapping/ straight mapping/ 1 :1 mapping ex. Created_on, modified_on
Reference table: to identify from which table the row/data is coming
ETLupdatetime: at what time the data is updated at target
Created_on: at what time the data is created at source
Modified_on: when the data is last modified at source

What is mapping document?


1.data mapping document defines relationship between source data fields to their related target
data fields which are involved in ETL process.
2. In simple terms, data mapping document is nothing but the map between source data and
target data in ETL process.

Why ETL Testers need mapping document?


ETL tester needs mapping document because, while testing on data of target tables we have to
refer data in source tables and mapping document having detail information of each and every
table that is part of ETL process from source to target along with transformation logic
06 may
What is ETL ?
1. ETL stands for Extract-Transform-Load.
•Extract is the process of reading data from a source system .
•Transform is the process of converting the extracted data to required form.
•Load is the process of writing the data into the target system.
2. It is a process which defines how data is loaded from the source system to the target
system (data warehouse).

What is ETL Testing ?


1. In ETL testing, we are going to validate the data that has been loaded from source system
to the data warehouse, is uniform in term of Quality, Quantity and Format.
Quantity: whether complete data is coming or not in data warehouse
Quality: we need to check quality of data (that means whether there is duplication of data,
there is any calculation mistake or not, rejected data is coming or not)
Format: we need to check format of data

2. In ETL testing, we are validating data which is loaded from Transactional system to
Analytical system

Purpose of ETL Testing


1. Purpose of ETL testing is to identify and mitigate(do solutions/reduce) general data errors
during ETL process.
2. As ETL Tester, we need to ensures data accuracy at target system because if the data is not
accurate, then the business decision will be wrong

Difference between ETL Testing vs Database Testing


Seq. ETL Testing Database Testing
no.
1. In ETL testing, we are validating data In database testing we are going to
which is loaded from Transactional check impact of front end on the back
system to Analytical system. end due to user operations.
2. ETL testing is normally performed on database testing is commonly performed
data in a data warehouse system on transactional systems
3. We have to validate data in both ETL We have to validate data in both ETL
testing and Database Testing but testing and Database Testing but
approach, purpose and steps taken for approach, purpose and steps taken for
completion are different. completion are different.
Types of ETL testing:
Q)..Types of validation in ETL you performed?
1. Metadata Testing (Verification Testing)
2. Data Completeness Testing (Validation Testing)
3. Data Transformation Testing (Validation Testing)
4. Data Quality Testing (Validation Testing)
5. Security Testing
6. Performance Testing

1..Metadata testing:
- In meta data testing, we have to validate physical model against it’s logical model.
- Metadata testing involves verification of :
 Table Name
 Column name
 Column data type
 Column data length
 Constraints

2..Data Completeness Testing:


-In Data Completeness testing, we are going to ensures that all the expected data is loaded in
target system from the source system.
-Data Completeness testing involves :
 Checking and validating the records between the source and target for
columns.
 Also, checking and validating the count of records with aggregate
functions, filter, against incremental and Historical data loads.

--In data completeness testing when there is incremental load(scd2) in target system then count
of source and target table did not macth. At this time we have to check only latest records of the
target table
SELECT COUNT(*) FROM CUSTOMER_TRG Where CUSTOMERKEY in (Select
MAX(CUSTOMERKEY) From CUSTOMER_TRG Group By CUSTOMERID );

---in data completeness testing when we have more than 1 source table and we have load all this
source table data in one target table.at this condition we will use mapping document in which
reference table name is given for matching the count of source records and target records

select count(*) from scr1 where sal>1000;......................souce1


select count(*) from target123 where reference_table= 'scr1';........target

select count(*) from scr2 where sal>1000;......................souce2


select count(*) from target123 where reference_table= 'scr2';........target

select count(*) from scr3 where sal>1000;......................souce3


select count(*) from target123 where reference_table= 'scr3';........target

3…data transformation testing:


-In Data Transformation Testing, we are going to ensures source data which is converted as per
requirement is loaded correctly in Target system (Data warehouse).
Q…on which transformation you have worked in etl pls give me a example?
-Data Transformations can be:
 Concatenation: suppose in source there are two column first name and last name then it
is transferred as full name in to target by using concatenation operator
 Joins: two tables of the source are transferred to the target by joining them in one table
into the target system(normalised data is converted into de-normalised data)
 Split: de-normalised data converted into normalised data(e.g. Male data in one table
and female data in another)
 Aggregate transformation: sum,count(*), min, max, avg
 Data conversions: use case statements(female as F , male as M)
 Derived column transformation :By Using source data we have to made the target data
(Ex. Suppose in source hire date is there and we have to calculate the experience of
employee in target table
 Filter transformation: no need of some source data to the target(esal >=1000)
4..data quality testing:
- In Data Quality testing, we are going to check accuracy of the data in target system.
-Data Quality testing involves checking and validating :
 Duplicate data
 Rejected data : the data which in source but no need of that data in target
-Row level rejection
-Column level rejection
-Table level rejection
 Data Validation Rules: constraints checking
 Data Integrity : data integrity is nothing but the checking of foregin key and primary key

There are two tables, table 1 and table 2 having PK and FK in the source. The data in the table 1
will be loaded in the target first due to presence of the primary key and then table 2 data will be
loaded in the target section due to foreign key. Foreign key will refer the data from table where
primary key is given. Table 1 has given job1 and table 2 has given the job2 having time period of
5mint and 7mint respectively. As the data from table 1 will be loaded first and then table 2 after
completing their time period. Due to which it will take time to load the data in the target.
Due to the presence of primary key and foreign key in the table
 We can’t insert the invalid data
 We can’t update invalid data
 We can’t delete any valid data from the table

When there is criticality and time concern we remove foreign key and add the data parallelly in
both the table with or without sequence. Any data can be inserted at any time in any table. It
saves the time and we don’t need to run both the job parallelly together.

Now we will insert the data in the child table whose parent table is not present.
We can’ make any changes in the data using invalid data after applying foreign key to the table.

If there is primary and foreign key present, it takes time run the job means data insertion in the
table. If there are thousands of records present, in that case it will take more time to insert the
data in the tables due to the presence of primary key and foreign key. As we need to add the data
sequentially in the table. After removing the foreign key there will be no waiting time for data
insertion in the table and no need to add data sequentially as well. So we can insert all the data in
the tables and after loading all the data we can apply the foreign key to the table. So, it will save
the time and no need to follow the sequence. If in any case if the foreign key is present but its
parent key is missing then it will show an error. We cannot apply foreign key in this condition.

Those records whose parent is not available in the table are called as orphan records and we can
calculate these records by below query.
 Calculation:

Addition, subtraction, multiplication and division these we need to check.


 Data validation rules:

It is nothing but the constraints.

Every customer in the system should have SSN. It is mandatory to have SSN. It would be not
null. If this is applied in the source then it will be ok. But, if this is introduced in the target then
need to observe very carefully. These data validation rules can be handled through constraints or
coding and this code can be written from database side or ETL side. It is not necessary to cover
these in the metadata testing. Sometimes if some restriction or validation is there you need to put
the data and need to observe the result to check whether is done or not.
5. Performance Testing: Verifying that data loads into the data warehouse within
predetermined timelines to ensure speed and scalability.

General ETL Testing scenarios

1 Validate ETL Run (ETL JOB/ ETL Code)


2 Validate records count between source and target
3 Validate null count between source and target
4 Validate the rejected data in target
5 Validate the duplicate data in target
6 Validate 1:1 transformation columns between source and target
7 Validate common transformation(concate,join,filter,derived column, split,data conversion)
8 Validate SCD transformation
9 Validate Data integretity
1
0 Validate calculations
What is Test Scenarios ?
•A Test Scenario is a statement describing the functionality of the application to be tested.
•It is also called Test Condition or Test Possibility.
•Test Scenario gives the idea of what we have to test.
•Test Scenario is like a high-level test case.
•A single test scenario can cover one or more test cases.

What is Test Cases ?


Test Case answers “How to be tested”.
Purpose of test case is to validate the test scenario by executing a set of steps.

Which is a good test case ?


Tester should consider below things while writing the test cases :
1. Test cases should be short.
2. It should be easily understandable
3. It should include a strong objective/Description/actions
4. It must have expected result

What is positive test case and negative test case?


Testing with valid input is called positive test case.
Ex. There is a text box in an application which can accepts only numbers entering value. So
positive test case is to test that whether numeric value is accept by text box
Testing with invalid input is called negative test case.
Ex. In a text box accepting numeric values, if we test that whether that text box is accepting the
character value then it is the negative test case

Template of the test case contain following (how is your organization test case template?):
Test case ID: unique identification id for each test case
Type: whether it is positive or negative test case
Description/Objective: describe the test objective in brief(test case)
Prerequisite: condition that must be met before the test case can be run . ex. User must be logged
in
Actions/Steps: list all the test execution steps in detail
Test data : use of test data as an input for this test case. List of variables and possible values used
in the test case. Examples: login id ={valid loginid, invalid loginid, valid email, invalid
email,empty} password={valid,invalid, empty}
Data validation query :
Expected result : what should be the system output after test execution. Describe the expected
result in detail including the message/error that should be displayed on the screen
Actual result : the actual test result should be filled after test execution. Describe the test
behaviour after test execution
Status : if the actual result is not as per the expected result then mark this test as failed.
Otherwise , update it as passed

Collection of code is called software build

22 may……
Environments in software development and testing:
1. Development /local environment
2. Test environment/ SIT/ QA(quality analysis)/ Stagging environment
3. UAT-Alpha environment
4. UAT-Beta/ pre-production environment
5. Production/live environment

1….Development environment:
- Development environment is also called local environment and Software Development
Environment (SDE).
- The development environment is used by developers to build the application.
- In development environment, developers are writing the code, testing the same so that it
can be deployed at Test Environment.
- This is organization level environment
- Testing done by developer is called WBT(white box testing)
- Entry criteria for development environment is,
1. Requirement gathering
2. Review
3. HLD(hign level design)
4. LLD(low level design)
- Exit criteria for developer environment
Coding is completed and code review, WBT is done by development team

2….testing environment:
- The testing environment is used by software testers to test the application.
- Software builds are deployed in test environment by developers.
- Testing Environment is also called as Software Integration testing environment (SIT
environment) or Staging environment.
- Once the testers completed the software testing, software build is deployed to UAT
environment.
- This is organization level environment
- Entry criteria for testing environment is,
Coding is completed and code review, WBT is done by development team
- Exit criteria from testing environment is,
Test execution is completed with defect fix impact

3….UAT-Alpha environment:
- The UAT environment is used by client’s side testing team to test the application.
- Client is going to test the application before moving it to next environment.
- Once the UAT testing done by client, application build is deployed production
environment.
- This is client side environment.
- Testing is done by technical team and professional tester
- Entry criteria for UAT-Alpha environment is,
Test execution is completed with defect fix impact
- Exit criteria from UAT-Alpha environment is,
Test execution is completed with defect fix impact in UAT-Alpha env

4….UAT-beta environment:
- It is also called as pre–production environment
- it is used by client/beta user/end user like people to test the application.
- Client is going to test the application before moving it to production environment.
- Once the testing done in pre–production environment by client, application build is
deployed in production environment.
- This is client side environment.
- Testing is done by unprofessional tester
- Entry criteria for UAT-beta testing environment is,
Test execution is completed with defect fix impact in UAT-Alpha env
- Exit criteria from testing environment is,
Test execution is completed with defect fix impact in UAT-beta env

5….production environment:
- The production environment is where users access the final code after all of the updates
and testing of all the environments, this one is the most important.
- The Production environment is the final stage, it used by end users.
- This is client side environment.
- Entry criteria for testing environment is,
Test execution is completed with defect fix impact in UAT-Alpha env.
* Application server and database sever is different for different env
Defect leakage: The defect which is missed from previous environment is called defect leakage

Big project is divided into different parts that parts are called as release.
DF- defect fix, TE- test execution
Onshore projects & Offshore projects:

if development , testing and UAT of one project is given to different different organization then
quality of the product increses and also dependency is not created. But more money is required
and more time is comsumed.

Q…whether you are involve in UAT ? / if there is defect in UAT then what will be your
approach as a tester ?
if there is defect at UAT(client side) then I will test that same defect at SIT(software integration
testing). After testing if defect is there then I will inform the developer to fix that defect.
After fixing the defect by developer I will retest that defect in SIT. After retesting the defect is
not fix then again defect send to developer. If the defect is fix then I will inform the developer to
redeploy the new build at UAT.
2nd scenario is that if there is defect at UAT then I will test that same defect at SIT. After testing
defect at SIT if there is no defect found then I will inform to developer that defect is not present
at SIT. Then again same defect is test at UAT. If defect is there then inform the developer to fix
that defect. If defect is fix then inform to developer to redeployed the new build at UAT. In this
scenario the defect is raise because of misunderstanding in requirements/invalid case/junk data.
Defect life cycle:

if defect is got then log the defect into JIRA /HPALM. When the defect is coming then status of
the defect is new. When the developer is to start working on defect then status of defect is open.
At this stage 4 possible outcomes from developer are there , that are
1. Fixed: defect is fix and ready for testing. If defect is not fixed then it is reopen and inform
the developer for fixing that defect. If the defect is fixed then it is closed
2. Rejected: defect is rejected for any of the reasons like duplicate defect, not a defect, non
reproducible
3. Duplicate: if one type defect is coming 2 times then that defect is duplicate marked by
developer
4. deffered : when the defect is not addressed in that particular cycle it is differed to future
release.

Test case execution life cycle:


test case execution life cycle contain regression testing and retesting.

Retesting:
- After defect is detected and fixed, the software should be retested to confirm that the
original defect has been effectively removed. This is called Re-Testing or Confirmation
Testing.
- In retesting, we have to run previously failed test cases again on the updated build to
verify whether the defects posted earlier are fixed or not.
- In simple words, Retesting is testing a specific bug after it was fixed.
- Defect verification is come under Retesting.
- It is carried out before regression testing.

Regression testing:
- Regression testing is done to ensure, previously developed and tested code still performs
expected after a change.
- Changes that may require regression testing include bug fixes, enhancements, changes
because of new features.
- In regression, we have to re-run previously passed test cases again on the updated build to
verify code changes doesn't affect any other part of the system.
- Every regression testing is retesting but every retesting is not regression testing.
Q..how to identify the scope of regression testing?(regression testing kha krna hai ye kaise
identify kroge)
In regression testing we are checking the side effects of changing code. Code changes is the
scope of regression testing . code changes happen because of bug fixes, enhancements, changes
because of new features.
Suppose we have A, B , C these three functionalities . functionality B is depend on C and C is
depend on B. in iteration 1 round 1 testing is done and A & C functionalities are pass but B is
fail. So that in iteration 2 retesting is done only for functionality B. after retesting ,B is pass.
While retesting there is changes in the code is happen. So checking the side effects of changing
code we are doing regression testing in iteration 3. In ideal regression testing we are checking
only B & C functionality because B & C are depends on each other. But in practical regression
testing we are checking A, B, C functionality.

Q…What is Builds confidence testing? or why regression testing is called as Builds


confidence testing ?
- Since the regression testing is done after the introduction of new changes in the system so
after the results if no bug is found then it builds confidence in the development team
because the changes that they made are working as expected.
- In simple words, regression testing is always done to verify that modified code does not
affect existing functionality and works as per requirements of the system.
We are doing regression testing to checking that Previously passed test cases are still
working as expected because of the new code changes

Project life cycle:

Smoke vs Sanity testing:

Smoke testing:
- Smoke Testing is nothing but the verification of basic functionality + troubleshooting the
defects in software build.(Need to check the major functionality of an application)
- No documents/test case for this kind of testing
- No defect is raised
- To check the stability of the software build
Ex1..logging window : able to move to next window with valid username and password
on clicking submit button
Ex2..user unable to sign out from the webpage

Sanity testing:
- Sanity Testing is a software testing technique which does a quick verification of the
quality of the software build to determine whether it is testable or not.
- Sanity testing is also called as level zero testing/ build health check up testing/ dry run /
build stability testing
- Sanity testing helps in quick identification of defect in core/basic functionality of an
application
- No documents/test case for these kind of testing
- No defect is raise
- To Check the stability of the software build Sanity Testing involves :
Functional testing -
1. URL
2. User credentials
3. Broken Image
4. Broken links
5. Add data, Update data, Delete data, serach data
6. Pages Title
7. Main menu, submenu

- If sanity testing pass -- > detail level testing/ start executing the test cases.
If sanity testing fails -- > build reject and sent it back to dev team.
STLC(software testing life cycle):
ETL Testing Challenges :
 Data loss during the ETL process.
 Large volume of data.
 Invalid and duplicate data at target system.
 Many number of Source data stores.
 Source to target mapping information may not be provided to ETL Tester.
 Does not have permission to execute ETL code/Job.
 Unstable Testing environment.

ETL Tester - Roles and Responsibilities:


 Understand logical flow of the application/ project.
 Understand and review database design documents. (Logical model vs Physical model)
 Understand and review source to target data mapping document.
 Analysis on business rules/data transformations/validation rules provided by Client.
 Create and maintain test plan document for ETL testing. (> 3 year)
 Identify the Test scenarios from mapping document for ETL testing.
 After that we have to develop test cases from test scenarios for ETL testing.
 Create test data and write Sql queries for all test cases.
 Review the test cases.
 After getting build from Development Team we are going to start executing the test cases
 Run ETL Jobs to load source data in target system.
 Document test result and log the defects for failed test cases.
 Retesting for failed test cases.
 Regression testing for code changes.(Defect fixes, Enhancements, New feature)
 Test closure, Sign Off from testing team to Project Manager and Development team.

ETL testing-
1) Check source file/table, target table
2) Check ETL code run / ETL job run.
3) Check record count between soruce and target system
BI Testing-
1) Report dashboard
2) Report Name
3) Reports fields
4) Report Type
Q…difference between regression testing , smoke testing , sanity testing
Regression testing Smoke testing Sanity testing
We are doing regression Smoke Testing is nothing Sanity Testing is a software
testing to checking that but verification of basic testing technique which does
Previously passed test cases functionality + a quick verification of the
are still working as expected troubleshooting the defects quality of the software build
because of the new code in software build. to determine whether it is
changes testable or not.

PLSQL: procedural language extension to structured query language


1..Stored procedure
2..Triggers
3..views
Procedure , triggers and views are the objects of the database

1..Stored procedure
What is Stored Procedure?
A stored procedure is a set of Structured Query Language (SQL) statements with an assigned
name,which are stored in a relational database management system(RDBMS), so it can be reused
and shared by multiple programs.

For ex: If you enter ‘Pune’ in search box of redbus and click on the search button, in the backend
a query is fired for this action such as “select * from table name(having data of pune)” and
sometimes it may have a long query present for this action. The is the action performed on
frontend and its impact show on backend. The storage procedure for this may be long and for this
we use keywords instead of long query and execute these keywords in the place of long query.
For ex.
In one schema you can create only one storage procedure. We will create a storage procedure for
particular query and then use it to call that set of statement. (This is without parameter)
Ex with parameter

We can put dynamic value in the procedure. We have to define parameter with data type and we

can change the value in the parameter


Alter

For delete
Drop procedure procedure name;

Have you ever interacted with store procedure? you are aware about store procedure?
Yes , I am interacted with stored procedure but indirectly. Some of ETL code(migration code)
are return in stored procedure . and our task is to execute it. If there is any error in that code we
will inform to developer.

Where you have to involve in stored procedure?


In one of our application there is functionality of migration of data from source to target that we
were test. We don’t test the code . we only check the functionality .

Do you aware about syntax of stored procedure?


Yes , the syntax of stored procedure is,
Create Procedure Search_Customer as any sql statement we can take here like
insert,select,delete,update

Select * from Customer


where BirthDate = '1971-10-06' ;
Create Procedure Search_Customer
as
Select * from Customer
where BirthDate = '1971-10-06' ;

Exec Search_Customer………………execution of stored procedure

Create Procedure Search_Customer1


@BirthDate Date
as
Select * from Customer
where BirthDate = @BirthDate ;

Execute Search_Customer1 @BirthDate = '1971-10-06', @Gender = 'F'

Update Customer set LastName = 'Bais'


Where customer_id = 37 ;

Create or Alter Procedure Search_Customer1


@BirthDate Date,
@Gender varchar(7)
as
Select * from Customer
where BirthDate = @BirthDate and Gender = @Gender ;

Drop procedure Search_Customer1

--1) Without Parameter


Create or Alter procedure Cust_search
as
Select * from Customer
where BirthDate = '1971-10-06' ;

Exec Cust_search ;
drop procedure Cust_search ;

exec Cust_search ;

--
--2) With Parameter

Create procedure Cust_search1


@BirthDate datetime,
@Gender Varchar(10)
as
Select * from Customer
where BirthDate = @BirthDate
and Gender = @Gender ;

Triggere:
What is trigger ?
Trigger is a special stored procedure that runs automatically when various events happen(ex.
Update,insert ,delete)

Views:
What is View?
View is a virtual table, based on the result-set of an SQL statement. We consider view as a table
Types of views:
1)Simple View: created from only one table. We can not use group functions like max(), count()
etc. does not contain group of data. DML operations could be perform through simple view.
2) Complex View: created from one or more table. We can use group functions like max(),
count() etc. does contain group of data. DML operations could not be always perform through
complex view.

Difference between stored procedure and triggeres?


Stored procedure Triggeres
1. Stored procedure may return a value Triggeres can not return a value
2 We can pass parameter in stored procedure We can not pass parameter in
triggeres

3. We can not write trigger within a stored We can write stored procedure in
procedure trigger

4. Stored procedure can be return for the Triggeres can be written for the tables
database

Difference between stored procedure and views?


Stored procedure views
1. Accepts parametres Does not accepts parameters

2 Can not be used as building block in large Can be used as building block in large
query query

3. Can contain statements like if , else, loop Can contain only select statements

4. Can perform modification to the tables Can not perform modification to the
table

5. Can not be used as a target for insert , Sometimes can be used as a target for
update, delete queries insert , upate, delete queries

Q…list a few ETL bugs?


1. User Interface Bug: GUI bugs include issues with color selection, font style, navigation,
spelling check, etc.
2. Input/Output Bug: This type of bug causes the application to take invalid values in place
of valid ones.
3. Boundary Value Analysis Bug: Bugs in this section check for both the minimum and
maximum values.
4. Calculation bugs: These bugs are usually mathematical errors causing incorrect results.
Load
5. Condition Bugs: A bug like this does not allow multiple users. The useraccepted data is
not allowed.
6. Race Condition Bugs: This type of bug interferes with your system’s ability to function
properly and causes it to crash or hang.
7. ECP (Equivalence Class Partitioning) Bug: A bug of this type results in invalid types.
8. Version Control Bugs: Regression Testing is where these kinds of bugs normally occur
and does not provide version details.
9. Hardware Bugs: This type of bug prevents the device from responding to an application
as expected.
10. Help Source Bugs: The help documentation will be incorrect due to this bug.

Q…mention the few test cases and explain them?


Among the most common ETL test cases are:
- Mapping Doc Validation: Determines whether the Mapping Doc contains ETL
information.
- Data Quality: In this case, every aspect of the data is tested, including number
Check(record count), Null Check(null count), Precision Check, etc.
- Correctness Issues: Tests for missing, incorrect, non-unique, and null data.
- Constraint Validation: Make sure that the constraints are properly defined for each table.
- verify ETL run(ETL job/ ETL code)
- verify record count between source and target
- verify concatenation transformation of first name and last name as full name

Q…What are the conditions under which you use dynamic cache and static cache in
connected and unconnected transformations?
- In order to update the master table and slowly changing dimensions (SCD) type 1, it is
necessary to use the dynamic cache.
- In the case of flat files, a static cache is used.

Q…Explain how ETL is used in data migration projects.?


- Data migration projects commonly use ETL tools. As an example, if the organization
managed the data in Oracle 10g earlier and now they want to move to SQL Server cloud
database, the data will need to be migrated from Source to Target. ETL tools can be very
helpful for carrying out this type of migration. The user will have to spend a lot of time
writing ETL code. The ETL tools are therefore very useful since they make coding
simpler than P-SQL or T-SQL. Hence, ETL is a very useful process for data migration
projects.
Q…How ETL testing is used in third party data management?
- Different kinds of vendors develop different kinds of applications for big companies.
Consequently, no single vendor manages everything. Consider a Telecommunication
project in which billing is handled by one company and CRM by another. For instance, if
a CRM requires data from the company that is managing the billing, now that company
will be able to receive the data feed from another company. In this case, we will use the
ETL process to load data from the feed.

Q…Explain partitioning in ETL and write its type?


- Essentially, partitioning is the process of dividing up a data storage area for improved
performance. It can be used to organize your work.
- The following reasons make partitioning important:
1. Facilitate easy data management and enhance performance.
2. Ensures that all of the system's requirements are balanced.
3. Backups/recoveries made easier.
4. Simplifies management and optimizes hardware performance.
- Types of Partitioning –
1. Round-robin Partitioning: This is a method in which data is evenly spread
among all partitions. Therefore, each partition has approximately the same
number of rows. Unlike hash partitioning, the partitioning columns do not need to
be specified. New rows are assigned to partitions in round-robin style.
2. Hash Partitioning: With hash partitioning, rows are evenly distributed across
partitions based on a partition key. Using a hash function, the server creates
partition keys to group data.

Q…What is the importance of ETL testing?


Following are some of the notable benefits that are highlighted while endorsing ETL Testing:
- Ensure data is transformed efficiently and quickly from one system to another.
- Data quality issues during ETL processes, such as duplicate data or data loss, can also be
identified and prevented by ETL testing.
- Assures that the ETL process itself is running smoothly and is not hampered.
- Ensures that all data implemented is as per the client requirements and provides accurate
output.
- Ensures that bulk data is moved to the new destination completely and securely.

Q…Explain the process of ETL testing.


- ETL testing is made easier when a testing strategy is well defined. The ETL testing
process goes through different phases, as below:

- Analyze Business Requirements: To perform ETL Testing effectively, it is crucial to


understand and capture the business requirements through the use of data models,
business flow diagrams, reports, etc.
- Identifying and Validating Data Source: To proceed, it is necessary to identify the source
data and perform preliminary checks such as schema checks, table counts, and table
validations. The purpose of this is to make sure the ETL process matches the business
model specification.

- Design Test Cases and Preparing Test Data: Step three includes designing ETL mapping
scenarios, developing SQL scripts, and defining transformation rules. Lastly, verifying
the documents against business needs to make sure they cater to those needs. As soon as
all the test cases have been checked and approved, the pre-execution check is performed.
All three steps of our ETL processes - namely extracting, transforming, and loading - are
covered by test cases.

- Test Execution with Bug Reporting and Closure: This process continues until the exit
criteria (business requirements) have been met. In the previous step, if any defects were
found, they were sent to the developer for fixing, after which retesting is
performed ,moreever regression testing is performed in order to prevent the introduction
of new bugs during the fix of an earlier bug.

- Summary Report and Result Analysis: At this step, a test report is prepared, which lists
the test cases and their status (passed or failed). As a result of this report, stakeholders or
decision-makers will be able to properly maintain the delivery threshold by
understanding the bug and the result of the testing process.

- Test Closure: Once everything is completed, the reports are closed

Q…What is the difference between the STLC (Software Testing Life Cycle) and SDLC
(Software Development Life Cycle)?
- SDLC deals with development/coding of the software while STLC deals with validation
and verification of the software

Q…What is Bus Schema?


- For the various business process to identify the common dimensions, BUS schema is
used. It comes with a conformed dimensions along with a standardized definition of
information

Q…Explain what is data purging?


- Data purging is a process of deleting data from data warehouse. It deletes junk data’s like
rows with null values or extra spaces

Q…Mention what is the advantage of using DataReader Destination Adapter?


- The advantage of using the DataReader Destination Adapter is that it populates an ADO
recordset (consist of records and columns) in memory and exposes the data from the
DataFlow task by implementing the DataReader interface, so that other application can
consume the data.

Q…Using SSIS ( SQL Server Integration Service) what are the possible ways to update
table?
- To update table using SSIS the possible ways are:
 Use a SQL command
 Use a staging table
 Use Cache
 Use the Script Task
 Use full database name for updating if MSSQL is used
Q…Explain what is data source view?
- A data source view allows to define the relational schema which will be used in the
analysis services databases. Rather than directly from data source objects, dimensions
and cubes are created from data source views.

Q…Explain what is the difference between OLAP tools and ETL tools?
- The difference between ETL and OLAP tool is that
- ETL is used for extraction of data from source system and load the data into target system
with some process.
- Example: Visual studio, Data stage, Informatica etc.
- While OLAP is meant for reporting purpose in OLAP data available in multi-directional
model.
- Example: Tableau etc.
Q…What are the various tools used in ETL?
- We have used only visual studio along with tableau for reporting purpose.

Q…Explain what is tracing level and what are the types?


- Tracing level is the amount of data stored in the log files. Tracing level can be classified
in two Normal and Verbose. Normal level explains the tracing level in a detailed manner
while verbose explains the tracing levels at each and every row.

Q…Explain what factless fact schema is and what is Measures?


- A fact is a measure event and the table containing the fact is called fact table.
- A fact table without measures is known as Factless fact table. It can view the number of
occurring events. For example, it is used to record an event such as employee count in a
company.
Q…Explain what is transformation?
- A transformation is a repository object which generates, modifies or passes data.
Transformation are of two types Active and Passive

Q…Explain the use of Lookup Transformation?


The Lookup Transformation is useful for
- Getting a related value from a table using a column value
- Update slowly changing dimension table
- Verify whether records already exist in the table

You might also like