You are on page 1of 38

Set Analysis vs.

If function

Set Analysis and If function are interchangeable in many cases. Many people often wonder which one is
faster. Well, in most cases Set Analysis is more efficient because it takes advantage of pre-built in-
memory indexes while an if function in an expression that often causes a full table scan. For example, to
calculate the sales in a shoe department, you can either use sum({<Department={“Shoe”}>} Sales) or
sum(if(Department=’Shoe’, Sales)). The first expression uses indexes to go right to the portion of the
table that contain Shoe while the second expression does a full table scan and examines each record to
determine if it belongs to Shoe.

You probably wonder why I said “in most cases” instead of “always” when I declare Set Analysis is more
efficient. It is true that sometimes an If function can be just as or even faster than Set Analysis. This is
because when a large portion of the table meets the criteria, the time spent on checking indexes could out-
weigh the saving in latter calculations. Using sum({<Department={“Shoe”}>} Sales) and
sum(if(Department=’Shoe’, Sales)) as an examples again, if more than 20% of the records in Sales table
belong to Shoe, then the if function might have an advantage because it takes time for the Set Analysis
expression to use indexes to figure out what records of the fact table contains Shoe data.

Avoid many-to-many key links

QlikView pre-calculates all of the links among tables so that it is fast when tables are joined to calculate
an expression. However, this is truer if every link key is a perfect key in one of the tables it connects. For
instance, in the data model below, ShipperKey links Shipper and SalesFacts tables and the closer
ShipperKey is to a perfect key in table Shipper, the faster the join can be calculated. When ShipperKey is
a perfect key in Shipper, each record in Shipper is linked to multiple records in SalesFacts, making the
link to be one-to-many, which is ideal for QlikView.

If the key field linking two tables is not a perfect key, then each record in either table is linked to multiple
records in the other table (many-to-many), this is very in-efficient in QlikView.

1
Testing Qlikview dashboard:
6 things to check before deploying the application:
1. Perform basic aggregations on the main Fact Table: I'm sure that most of you use incremental load
with the dynamic "Where" clause. In this process, we might have a logical error while building the
"Where" clause. To mitigate this potential bug, we need to check the row counts and sum total of the
measure fields by comparing with the underlying source table. This way you will always know that you
have extracted the full data set from the underlying source. This will be first check and for further
assistance you can use system fields like $Field, $Table and $Row etc.
2. No aggregation on Key Fields: Key fields in QlikView must always be used only as a Join keys.
Make sure you are using “HidePrefix” keyword along with the SET variable statement and use the same
special character/ symbol as the prefix while assigning the field name (Example: %CustomerKey instead
of [Customer Key]). With this approach, either you or other developers will not use this field accidentally
as part of the UI design (You will see this field if the “Show System Fields” is checked). It’s also very
important to note that you don’t perform any calculations on this field and remember not to use the key
field as the chart dimension. Instead, you can duplicate the same field with the different name.

Qlik automatically links 2 fields when having the same name (also called keys). When we have
a link between 2 tables, one should prevent to have an aggregation on key fields as Qlik will

2
count on both tables. Therefore, it is likely that your aggregation will not be correct, and not
reflecting your actual data.

Tip: To avoid any mistake, the “hideprefix” function must be used: i.e., when you label your keys
with ‘%’ all keys will be hidden and thus can’t be used for aggregations and calculations. E.g.
use %CompanyID instead of [Company ID]
3. Check for Information Density and Subset Ratio: Always perform high level integrity check on
your data model. You can see Information Density and Subset Ratio properties in the Table Viewer (Ctrl
+ T) by hovering on the fields. Investigate wherever Information Density is less than 100% and inform
the Architect about the potential issue(s) with the NULL values. I would always check for Subset Ratio
whenever I perform a QlikView Join. This way you know how many key field distinct values are
associated to other table.
Definitions of Information Density and Subset Ratio (Source – Reference Guide):
o Information Density is the number of records that have values (i.e. not NULL) in this
field as compared to the total number of records in the table.
o Subset ratio is the number of distinct values of this field found in this table as compared
to the total number of distinct values of this field (that is other tables as well).

Creating a Dashboard is not always an easy thing. It requires a lot


of collaboration between the consultant, end-users, management and the IT
department. As element61, our goal is to deliver quality which means delivering correct
data that the end users can rely on day after day. To guarantee the long-term
correctness of data, one needs to thoroughly structure the back- and front end and have
a verification method in place towards validating the data. As Qlik experts, we have the
responsibility to get things right. Luckily, Qlik gives quick and developer-friendly
tools to enable these validations. Additionally, we want to outline in this insight some
tips & tricks on how to run and orchestrate such front- & back-end validations.

The front-end verification

Make sure the functional use cases are clearly discussed & documented

It is best practice to thoroughly detail the use-cases; something a developer ideally does
together with the business end users. Use-cases typically define how a user will use
the dashboard: i.e. which dimensions & metrics he wants to see, filters he will use, etc.
Once these are detailed, testers can easily access each use case one-by-one. Note that
use cases should be verified every time modifications are made to the data model.
Going through the use-cases is very important to verify if the application responds well
to every requirement that have been asked. This might include functionalities but also
security rights: e.g. do you have access to sheet X, can you see object Z?

3
Tip: structure the documentation of your use cases:
Use cases can best be written as

“as a [User Type] I can/should see [Action] in order to [Goal]”


Or “as a [User Type] I can/should see [Amount] for [Selection”]

Create a quality dashboard for raw data

To know if the output data is correct, we base ourselves on the input data. This data
serves as our reference which means if input data is wrong, the output data will be
wrong. At element61, we offer the customer a quality dashboard that contains
exclusively the input data, where each table is independent from the other. This
allows simple verification by the developer as well as the customer. If the input data is
wrong, it might be that incorrect data has been provided. On the other end, if the output
data is wrong, it must be an incorrect logic by the developer. Therefore, a quality
dashboard is very important and reduces discussions on the responsibility of the
incorrect data.

Testers enjoy having the possibility to check on input data quickly and as often as they
want. It is time consuming to manually verify and access every single source used on
the dashboard. This dashboard will only be provided to the test users and will be a
reference for the testers. Do not forget to test the quality dashboard to avoid errors.

Use existing documents for data validation

Documents such as previous/current P&L (for a financial dashboard) can be used to


verify the figures at a high level. It is much easier to test on available numbers. It will
show as well if the previous data were correct.

4
The backend verification

Only start working when business questions are well-defined

Preparation prevents rework. As such, it’s important to make sure that we only start to
build our data model once we are sure it allows to answer the required business
questions. A data model that displays correct - yet unnecessary - information remains
useless. The data model must be built with the business demands in mind, and only
contain the necessary data.

Perform basic aggregations on the main tables

Every fact table must contain the exact same number of rows as its raw equivalent.
Similarly, joins should not cause data loss where there should not be any (e.g. a left
join). This can be validated: i.e. some basic calculation can be made to compare the
raw data with the aggregated/joined data.

Specifically, when using joins, a validation of the #rows an aggregation has, prevents
the loss of any important data. Simple formulas such as =sum(amount),
=count(customerID) and count(DISTINCT customerID) can be used.

Refrain from aggregations on key-fields

Qlik automatically links 2 fields when having the same name (also called keys). When
we have a link between 2 tables, one should prevent to have an aggregation on key
fields as Qlik will count on both tables. Therefore, it is likely that your aggregation will
not be correct, and not reflecting your actual data.

Tip: To avoid any mistake, the “hideprefix” function must be used: i.e., when you label
your keys with ‘%’ all keys will be hidden and thus can’t be used for aggregations and
calculations. E.g. use %CompanyID instead of [Company ID]

Check for Information Density and Subset Ratio

 Information density indicates the percentage of rows that contain a distinct value for a
field.
 Subset ratio (only on key) shows the percentage of all distinct values for a field in the
table compared to all the distinct values for that field in the entire data model.

Both indicators are paramount when verifying the data model. They indicate if the key
chosen is a quality field between 2 tables. If there are no/little common values between
the 2 tables, it means that the key cannot be used or that the key is not precise enough.
It saves a lot of time and can be used to check a key before joining 2 tables.

Tip: The sum of subset ratio between tables of 100% means 0 common value between
tables.

5
Check for connection strings in the Qlik script

Connections to various databases in Qlik is normal in a project. Errors can easily arise
when changing source. It is important to check connection strings, and as well to show
on the dashboards where the data come from, and which environment is used.

6
Tip: It is wise to keep connection strings in variables to avoid refactoring of code.

4. Check for connection strings in the QlikView script: Logical bugs are very difficult to identify.
Generally, you might need to extract the data from more than one source. And sometimes you need to
extract the data from multiple environments where the underlying schema will be same. You will be
extracting the data from Dev & QA which has same schema and table names but different data. In this
case, it’s very hard to debug because everything is right about your query except that you are using old
connection string. So make sure that you abstract the connection strings to excel file/ database to manage
them from one central place.
5. No syntax error doesn't mean no logical error: Set Analysis is great feature in QlikView. It will be
very useful to control, identify and modify the sub set of the data. Generally, if there is no syntax error
then it doesn't mean there is no logical error. Syntax errors are your friend's enemies but logical errors
are YOUR enemies. Logical errors are more dreadful than the syntax errors. It’s hard to identify the
logical errors compared to the syntax errors. So make sure, you always check the Set Analysis
expressions compared to SQL queries. Where set modifier is equivalent to SQL "where" clause and set
operators are "relational" operators in SQL.
6. Check for intruders in the dimensional tables (AKA: NULL Values): As a rule, we shouldn't have
null values in the dimensional tables. You would always expect that your dimension fields should have
100% information density but real world is different from the theory. So it’s important to keep an eye on
the dimensional tables. Because it's equally important to know - what is missing compared to what is
available!

A few simple inspections can be performed first as these will require less time and effort to review.

7
 Review the connection strings of the databases that are used within script.
 Review the SET variables to ensure that the proper file locations are being referenced.
 Review the script to make sure that the proper flat files are being referenced.
o Best Practice: List connection and flat file locations on the first tab within the Script
along with comments to remind one to review prior to release.
Next, check integrity of the data model, making sure that the data is presented accurately for business
users.
 Does the model build without errors? If yes, that is great but there could be a logic error that is
still lurking in the dark. An approach to finding logic errors is to compare Set Analysis to SQL
queries.
 Verify the tables by row counts and totals of the measures within the fact tables as compared to
the source tables.
 Key Fields should not be used for any calculations within the UI. If a key is needed within the UI
create a copy of the key and rename it.
 Review composite keys making sure that the data types align.
 Informational Density less than 100% could mean NULL values are present and will need
reviewed.
Once these assessments have been performed, move along to the UI and create a few test cases to spot-
check any variables that are in use. Text boxes are a great way to watch variables and how they are
affected during the test case.
The goal of data validation is to get the developer thinking of any possible oversights prior to
release. The rapid development cycle allows for iterations so it will not be uncommon for an exception to
pop up that will require changes to the existing data model or how a table is joined. This is the joy of
rapid development within QlikView.
Happy modeling! 🙂

Information Density: Information Density is the number of records that have values (i.e. not NULL) in
this field as compared to the total number of records in the table.
Subset ratio: Subset ratio is the number of distinct values of this field found in this table as compared to
the total number of distinct values of this field (that is other tables as well).
Circular Loops: It is undesirable to have multiple common keys across multiple tables in a QlikView
data structure. This may cause QlikView to use Circular references to generate the connections in the data
structure. Circular references are generally resource heavy and may slow down calculations and, in
extreme cases, overload an application.

8
-Comment the fields in load script
-Rename the fields in load script
-Rename the fields using UNQUALIFY operator.

Information density And Subset ratio:

Information density shows the percentage of number of records containing non null values in the field as
compared to the total number of non null records in the table.
Subset ratio shows the percentage of no. of distinct values in this field in a table to total no. of distinct
values of this field present in the entire data model.
Selecting arbitrary date range in Qlikview:
https://community.qlik.com/t5/QlikView-Creating-Analytics/Selecting-Arbitrary-Date-Ranges/td-
p/407268

Error handling
ErrorMode
ErrorMode is the first statement to be used in beginning of the script
-Errormode = 0 – Will ignore all script errors and continue refreshing application
-ErrorMode = 1 – Will halt the script execution and prompt a script error message dialog box
-ErrorMode = 2 – Will stop execution where there is a error in table load and shows as execution
completed. Once click on close we will get a popup saying “script execution failed”.

Syntax :
SET ErrorMode = 0;

Script Error
-Returns the error code of the last executed script statement
-This will be reset to 0 after each successfully executed script statement.

9
-If an error occurs it will be set to an internal QlikView error code. Error codes are dual values with a
numeric and a text component.

Below are the different error codes


Error Code Error
0 No error
1 General error
2 Syntax error
3 General ODBC error
4 General OLE DB error
5 General custom database error
6 General XML error
7 General HTML error
8 File not found
9 Database not found
10 Table not found
11 Field not found
12 File has wrong format
13 BIFF error
14 BIFF error encrypted
15 BIFF error unsupported version
16 Semantic error

ScriptErrorCount:

-Returns the total number of statements that have caused errors during the current script execution.
-This variable is always reset to 0 at the start of script execution.

Value list and value loop:


These are used to create artificial dimensions in charts and tables. Value list takes an arbitrary list of strings and
creates a dimension; Value loop creates a dimension with a number sequence.
Syntax:ValueList('Value1', 'Value2', 'Value3')
ValueLoop(StartValue, EndValue, Step Value)

valueloop ( 1, 3 ) valueloop ( 1, 5, 2 ) valueloop ( 11 )


From 1 to 3, (step is omitted so 1 is assumed): From 1 to 5, step 2:1,1+2=3,3+2=5 returns the value 11

10
valuelist ( 1, 10, 100 )

valuelist ( 'a', 'xyz', 55 )

Dimension & measure tags and its uses:


We can create tags in load script using command
Tag Field| Fields field_name with tag_name ;
Or we can create tags under document properties -----Tables Tab
Uses:This will be useful when users accessing your document creating their own visualizations (They can identify
which fields are measures and metrics) .we can see this tags when cursor hovering on fields in table viewer.
Different file and QVD functions:
Attribute(fieldname,attribute name)
Connectstring()
Filebasename()
Filedir()
Fileextension()
Filetime()
Filename()
Filesize([filename])
Getfolderpath()
Qvdcreatetime(filename)
QvdNoofrecords(filename)
QvdNooffields(filename)
Qvdfieldname(filename,fieldno)
Qvdtablename()
 File functions queries metadata of file system ; Qvd functions queries metadata of qvd xml header.
 Qvd is also a type of file so many of the file functions will work for Qvd .
How to store specific fields to qvd file:
Store field1,field2from table_name into fields.qvd(qvd);
Alt() function and its use:
Alt() belongs to conditional function (like extended if condition)
It returns the first parameter having the valid number representation, if no match found it will return the last
parameter. Syntax:Alt(case1,[case2,case3],else);
Link Table:

11
It is a table consisting of common fields of one or more tables with in same database or not.
It is used to solve data model issues like synthetic keys and loops. Main idea is to achieve star schema ,creating the
fact table that is connected to all other tables.
Synthetic keys?How to avoid them:
When we have more than one common fields between two or more tables ,qlikview automatically creates synthetic
keys and synthetic tables.
Issue with Synthetic keys:
When the no. of synthetic keys increases , depending on data amount, table structure and other factors, qlikview may
or may not be handle it gracefully, it may end up using excessive amount of time and memory that leads to poor
performance of data model .
When you have a lot of rows and a key created from multiple columns, loading time could be much faster for
synthetic key than for field concatenation.
Methods to avoid Synthetic keys:
1) Removing fields: If the common fields causing synthetic keys are not required in data model and doing so
will not affect the relation b/w two tables then we can remove the common fields using commenting or
done by removing fields from load script.
2) Renaming fields: If the common fields causing synthetic key are not the same fields(not having similar
values), they are two different fields with same name , then renaming can be done by using AS clause . we
can do the same by using Qualify statement also, with Qualified statement field names are converted into
Tablename.fieldname format.
Ex:Qualify Region;
Load branchAS Branch_name,
Region
From emp.xlsx;
3) Autonumber / composite key: When we know that the common fields causing synthetic key is important
for our data model then we need to create our own key to handle composite keys. To achieve this we can
use autonumber / autonumberhash128 / autonumberhash 256 functions. This will create a unique bit value
to each distinct combination of concatenated fields. Autonumberhash128 and Autonumberhash256 creates
128bit and 256bit values respectively. Please note Autonumber may be problematic in applications
generating the QVD files for use in other QlikView applications.
Ex:

12
4) Concatenating similar tables: When we have multiple common fields b/w two tables, we can’t remove or
rename all those fields as they are significant and related to each other. Also we have some different fields
in both the tables, in this case qlikview will not automatically concatenate the tables, so we need to force
concatenation using concatenate to combine two tables into one.
5) Link Table: If we have more than one fact tables causing synthetic keys, having some different fields, so
that we cannot go for concatenation and also we need them for our analysis then we will go for link table.
Link table is a table consisting common fields of one or more tables with in same database or not.
Rules to define Link Table:-
 Create a key based on common fields of fact tables and break all other association using commenting or
renaming.
 Make sure that all of the combinations that exist in both fact tables are available in the created link table
else it may cause missing out on some records.
 The link table must have distinct records.

SCD (Slowly Changing Dimension):


Type 1: This methodology overwrites old with new data, and therefore does not track historical data. Its common
uses are for misspelled names.
Type 2: This method tracks historical data by creating multiple records for a given natural key in the dimensional
tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert.
Type 3: This method tracks changes using separate columns and preserves limited history.
If possible can you please explain the same with sample data/Example?

SetStatename in actions of button:


If you have alternate states in the document you will get several new actions listed .Then you can trigger these
actions in a button.

13
You can use the "Set State Name" action on a button to just change which Alternate State that some sheet object
belongs to. This can come in handy in some situations to build more sophisticated user interfaces.

Preceding Load:

Thread 1:
Loading data from already load statement is called preceding load.

For example

LOAD //This section is called preceding load


*,
Month(Date) AS Month,
Year(Date) AS Year;
LOAD // Normal load.
Department,
Date,
Sales
FROM DataSource;

Resident:
Data loaded from already loaded table is called Resident table.

For example

TableName:
LOAD
*
FROM DataSource;

Data:
LOAD //Resident Load
*,
Month(Date) AS Month,
Year(Date) AS Year
RESIDENT TableName;

Thread 2:
Perhaps your perception:
In case of Resident Load data gets copied into RAM for the second time ....is not right.
QV probably would not load it twice, rather fetch the already loaded (in RAM) data.

And your statement:


In preceding load data fetched from data sources only once and gone through some
transformations on the fly in the preceding steps and then finally loaded into RAM. ..is not
probably accurate either.
The data is already there in RAM when any operation is being performed on it. Preceding load is like
working in a pipeline in a bottom-up approach. Some operation is performed on a record (Note: it is one-

14
by-one record process) and in another operation (preceding load) it operates on the result of first
operation.

Now I would comprehend Why Preceding Load is Faster (in General) like:
In preceding load, multiple operations are done in a sequence/pipeline record-by-record, so the QV engine
would fetch (from the base address of RAM) the same record only once for multiple operations. Whereas,
in resident load, the same record has to be fetched multiple times for multiple operations.

Preceding Load:

A QlikView feature that is poorly known and brilliant in its simplicity is the Preceding Load.

It is a way to define successive transformations and filters so that you can load a table in one pass but still
have several transformation steps. Basically it is a Load statement that loads from the Load/SELECT
statement below.

Example:
You have a database where your dates are stored as strings and you want to use the QlikView date
functions to interpret the strings. But the QlikView date functions are not available in the SELECT
statement. The solution is to put a Load statement in front of the SELECT statement: (Note the absence of
“From” or “Resident”.)

Load Date#(OrderDate,’YYYYMMDD’) as OrderDate;


SQL SELECT OrderDate FROM … ;

What happens then is that the SELECT statement is evaluated first, and the result is piped into the Load
statement that does the date interpretation. The fact that the SELECT statement is evaluated before the
Load, is at first glance confusing, but it is not so strange. If you read a Preceding Load as

Load From ( Select From ( DB_TABLE ) )

Any number of Loads can be “nested” this way. QlikView will start from the bottom and pipe record by
record to the closest preceding Load, then to the next, etc. And it is almost always faster than running a
second pass through the same table.

With preceding Load, you don’t need to have the same calculation in several places. For instance, instead
of writing

Load ... ,
Age( FromDate + IterNo() – 1, BirthDate ) as Age,
Date( FromDate + IterNo() – 1 ) as ReferenceDate
Resident Policies
While IterNo() <= ToDate - FromDate + 1 ;

15
where the same calculation is made for both Age and ReferenceDate, I would in real life define my
ReferenceDate only once and then use it in the Age function in a Preceding Load:

Load ..., ReferenceDate,


Age( ReferenceDate, BirthDate ) as Age;
Load *,
Date( FromDate + IterNo() – 1 ) as ReferenceDate
Resident Policies
While IterNo() <= ToDate - FromDate + 1 ;

The Preceding Load has no disadvantages. Use it. You’ll love it.

How to Drop Variable at Script Level

let vari=now();
let vari2=today();

let vari=; // For Drop VAriables


let vari2=null();

Can we use where clause and groupby in preceding load?

Yes we can use


Ex: Tab1:
Load *,max(salary) as maxsal
where date='01/10/2017'
group by empname,Date;
Load empname,
Date,

salary
from----------(.txt);

NPrinting Vs PDF Distributor


1. If for some reason you purchased PDF Distribution add-on (like what happened in one of
the projects I am currently working on), then take advantage of it, because like you said
is quite expensive (is as expensive as purchasing Publisher again).
2. If you're evaluating different reporting options, then to be honest, with NPrinting you can
do a lot more than with PDF Distribution. In general terms, NPrinting is a much more
robust solution for reporting.

I'll try to give you the highlights of every solution:

16
PDF Distribution:

 Natively integrated with QlikView environment. You just have to acquire the license and
then you can integrate it with publisher
 You can distribute reports only in PDF format. You have to build the reports in QV
Desktop and then in Management Console you can reduce and/or loop your reports to
generate multiple outputs
 You can send reports by e-mail even if they're not part of your domain
 The GUI for building reports in quite messy, and you don't have pixel precision
 You can apply filters (by field) or bookmarks to your reports, however you can't combine
bookmarks with field selections

NPrinting:

 Lets you create and distribute reports in office formats (Word, PPT; XLS), PDF, image
formats, and even create html reports.
 It works as a Client-Server architecture. The NPrinting Console is the equivalent to the
Management console in QlikView, so the processes are executed in the background.
 Tasks for creating and distributing reports are a lot more developed than in PDF
Distribution, you can apply multiple filters to a single report (for example combining field
selections, bookmarks and filters by variable values)
 You can create filters alone, like having a "filter library" so you can reuse them for
different reports.
 You can even have publisher functionality (reduce, loop, distribution) of QVWS. So, if
you haven't purchased publisher, NPrinting is a cheaper option.
 You can create pixel perfect reports, so it's like having a photoshop worksheet where
you can add shapes, arrows and put every element exactly where you want it
 Recently, NP added a funcionality called NPrinting on-Demand where users can interact
with the reports, so the can run report tasks directly from apps or even have a web portal
where you can request new reports.
 In general, NP is an application focused in Reporting. In my humble opinion, QlikView is
awesome at visualization, analysis and everything we know about QlikView (Business
discovery, associative experience, etc etc) however the reporting side is quite poor,
compared with what you can do with NP. So, I think NPrinting is a wonderful add-on for
covering reporting requirements in QlikView.

Optimized QVD:
Generally, QVD's are one of the most important in any Qlikview deployment.

17
A QVD (QlikView Data) file is a file containing a table of data exported from QlikView. QVD is a native
QlikView format and can only be written to and read by QlikView. Reading data from a QVD file is
typically 10-100 times faster than reading from other data sources.

More information on QVD Load


- Increasing Load Speed
- Decreasing Load on Database Servers
- Consolidating Data from Multiple QlikView Applications
- Incremental Load

Whenever we are using QVD as data source, we need to keep the Optimized load for better reload times.
But in some cases we didn't keep optimized loads especially if we are using filters with Where command.

Other hand, we can keep optimized loads in some scenarios:

LOAD * FROM QVDANAME.qvd (qvd) Where Match(REGION_NAME,'US','CANADA');

In the above case, we are loading the data only from US and CANADA region and this is unoptimized
load. But we can load as Optimized load by using EXISTS key word with temp table.

First, create the Temp table with filters and use EXISTS key word in main QVD loading statement like
below:
TEMP:
LOAD * INLINE [
REGION_NAME
US
CANADA];

LOAD * FROM QVDANAME.qvd (qvd) Where EXISTS (REGION_NAME);

DROP Table TEMP;

Note: Please make sure that TEMP having the same field name which QVD have

Optimized QVD:

In previous articles I have mentioned how critical it is to ensure your loads from QVD are optimised, but have not

gone into the detail of how to do this. This post rectifies that. Here I explain what an optimised load is, why you

should use them and how to perform them.

Optimized QVD loads are up to 100 times quicker than non-optimized ones. That makes a lot of difference if you are

watching a reload dialog, it even makes a lot of difference to your server performance if your reload is running on a

schedule.

The reason for the vast difference is related to the much publicized compression algorithm that QlikView uses when

storing data for in memory analysis. QVD files are stored in a format that mirrors the compression used in memory

(which is why QVD files are so small on disk) and during an optimized load the data is sent directly from disk to

memory directly in the same compressed format. When a non-optimised load is performed this is not the case.

18
So why not make all loads from QVD optimized? The simple fact is that some operations require the data to be

unpacked, modified and then re-packed. This significantly slows the process. Just about any change to the data on

the way out of the file and into memory will cause a load to be non-optimised.

Some examples of things that will cause a non-optimized load are:

– Adding new fields to the table

– Deriving new values from a field in the QVD

– Retrieving a field twice

– Most WHERE conditions

– Joining to an existing in memory table

– Loading data into a mapping table

In contrast the things you are allowed to do are:

– Rename fields

– Omit fields

– Do a simple one field WHERE EXISTS on a field returned in the record set

This sounds hugely restrictive, but then most things you would want to achieve can be coded for. For example, if you

need to add fields – do this in the QVD generate routine rather than when reading the QVD. Similarly, if you need to

derive a value do this when you generate the QVD also. Even complex WHERE statements can be done by deriving

flags or composite keys in the QVD generate routine and then doing a simple WHERE EXISTS on a temporary table

(even if that temporary table is just a single row from an in-line table).

In fact, optimized QVD loads with a WHERE EXISTS clause on each subsequent load statement is a simple but

effective way of quickly building documents which contain related subsets of data – but that is something for another

post.

So, how do you know if your load is optimised? Well, the first way is by noticing it is still running when you return to

your desk with a fresh cup of coffee. The other is by checking the load progress dialog. Optimised loads show the

text qvd optimized as the data is being pulled from the QVD – in contrast no message is shown when the load is

non-optimised.

19
Always look out for that text and if it is not there when loading from a QVD then there will be merit in reviewing your

load script to make the load optimized.

When the incredible speed of an optimized load is really essential is when you look to build an incremental load

strategy and fresh data from source databases are combined with data previously stored in QVD’s. If the retrieval of

the old data is not quick and efficient then the whole point of the incremental load is eroded.

It is worth noting here however, that a non-optimised load from a local QVD file will still typically be much much faster

than from any other data source. Sometimes non-optimised loads cannot be avoided (or the development required to

avoid them is not worth the time saving).

Hopefully this article has given you the information you need to make sure you loads are optimised, or make an

informed decision to allow non-optimised loads.

As I have said in previous articles the back end of your QlikView document is at least as important as the front end

– and ensuring optimised loads is an important part of getting the back end correct.

Q) When you address QVD generate routine – do you then mean resident load?

A) No, by QVD generate routine I mean the QVW file that is used to load data from source and write out a QVD. This should be
kept separate from the QVW which has your analysis in it. It is in this routine that all of the manipulation of fields (such as building
composite keys) should be done.

Q) what is best practice in optimizing Group By loads for large tables. should you group by as few fields as possible
and use aggregate functions such as firstvalue() for text fieldsalong with your max() sum() etc… or should you use
less aggregates and group by as many fields as possible to still allow for the specific group level you are after?

A) That is a good question! I would always look to put all fields in the group by statement – simply to avoid the risk of
removing values accidentally with FirstSortedValue – the only aggregation functions I would typically use would be to
aggregate numeric fields. If optimizing performance to the n’th degree is important you would have to try

20
benchmarking over a serious amount of data. My gut feel would be more group by fields would be more performant –
but QlikView can often surprise on things like this.
Q) Please explain how Where Exist() works ?
A) The WHERE EXISTS compares data in a field that has already been loaded during the load script to data in the table that is
presently being loaded. In its simplest form the field name in memory is the same as the field name in the file being loaded (this is a
prerequisite for optimised load). You can kind of think of it as doing an inner join between the table being loaded and some
previously loaded data.

Q) I
used the WHERE EXISTS condition , I could not find any data in the table. Loading data for
2014,2015,2016,2017

A) Were you doing a match against a field with just the year in the QVD? How did you create the
field you were looking up against?
One potential problem is with data types, if the year is stored as a string in the QVD, and a numeric
in the data model the WHERE EXISTS will return no rows. The code should be as simple as:
Temp_Year:
LOAD
Year
INLINE [
Year
2014
2015
2016
2017
];
MainData:
LOAD
*
FROM MyQVD.qvd (qvd)
WHERE EXISTS (Year)
;
DROP TABLE Temp_Year;
If you run the code without the WHERE EXISTS line and the Drop and then add the Year field as a
listbox in the app you will see if the values are the same (as fields from both table will show
separately in the listbox) or whether the year from both sources shows separately.

Tips:
tip to keeping your loads optimized when concatenating two tables.
If the 2nd table contains the all the fields in the first and then some other ones the load will be optimized.
Example 1 – Only first load will be optimized
TABLE:
LOAD
FIELD1,
FIELD2,
FIELD3
FROM
B.QVD
(qvd);
concatenate(TABLE)
LOAD

21
FIELD1,
FIELD2
FROM
A.QVD
(qvd);
Example 2 – Both loads are optimized
TABLE:
LOAD
FIELD1,
FIELD2
FROM
A.QVD
(qvd);
Concatenate (TABLE)
LOAD
FIELD1,
FIELD2,
FIELD3
FROM
B.QVD
(qvd);

Sometimes you will find that you need to add dummy fields, perhaps with null values, to some QVD’s so that they concatenate onto
others.
Another amazing fact is that a LOAD DISTINCT from QVD can be optimized too. So, in my case I could use it to solve a slow Where
condition:
/* Slow Load, not qvd optimized:
Bookings:
LOAD * From Bookings.qvd (qvd)
Where ID>0; // ..Or with other condition: Not IsNull(ID)
*/
// qvd optimized
ExistingID:
LOAD DISTINCT ID From Bookings.qvd (qvd);
// qvd optimized
Bookings:
LOAD * From Bookings.qvd (qvd)
Where Exists(ID);
Drop Table ExistingID;

Another one is if you load from a QVD with a LEFT JOIN prefix that can now be optimised (as long as the other
criteria are met about not modifying data).

 QVD is storing unique data with pointer to the actual values. Every field has its own table and pointers to
that value. Pointers are bit stuffed and they are used to lookup actual values. Result of this kind of keeping
is better compression. This compression is done every time when load script is running.

22
23
24
25
What is the use of - Export Sheet layout?
When we want to preserve the layout of a sheet to be used again, we export the sheet layout which creates
a XML file without any data.

What is Webview Mode?


The WebView mode uses the internal web browser in QlikView to display the document layout as an
AJAX page.

What is Fuzzy search in QlikView?


Fuzzy search finds all the values according to their degree of resemblance to the search string. Which
means, even if the spelling does not match character by character, those results will also be shown.

What is a Bookmark in QlikView?


A bookmark in QlikView captures the selections in all states defined in a QlikView document. It can be
saved and accessed later.

What is a user bookmark and a shared server bookmark?


The User bookmark is saved in the user computer while the shared server bookmark is saved in the server
and accessible to all the allowed users.

What is a selection indicator in QlikView Document?


A selection indicator is used to indicate the type of association between the data present in different sheet
objects. A green dot indicates selected values, blue dot indicates locked values and red dot indicates de-
selected values in AND mode.

26
When do we need to use the option “Force 32 Bit”?
When connecting to a database using ODBC, if the data source only provides 32-bit driver we use this
option.

What is the difference between QVX and QVD files?


The QVD file is a proprietary and optimized for minimum transformations inside QlikView but the QVX
file has an open file format which shows both the table structure and the table data in it.

What is Garbage option in the Data Transform wizard?


The Garbage option is used to mark and delete the data that is not required or that is jumbled and not
useful.

What feature does the Fill Tab in data transform wizard provide?
The fill feature is used to fill in empty cells with values from adjacent cells.

How can we split the data in a table vertically or horizontally?


The data in a table can be split by using the unwrap transformation.

What is Context cell Expansion in QlikView?


Context cell expansion is used to expand the contents of one cell into several cells in the table.

How can we drop some fields from the memory during script execution?
We can use the statement Drop field A;

What is a Mapping Table?


The mapping table is a temporary table to provide a mapping of values forms one column in first table to
another column in the second table. It has only two columns and it is dropped after script execution.

What is the difference between NullAsValue and NullAsNull?


NullAsValue allows linking of data which are null but NullAsNull treats the null values as missing values
and does not allow any linking between such values.

How can we get the number of statements which have caused errors during a script execution?
Using of ScriptErrorCount system variable.

What is the value of X in the following code?


Set VAL=’$1*$2’;
Let X = $(VAL (6, 4, 9));
ANS: 24

27
Compare QlikView and Tableau
Criteria Tableau QlikView

Data integration Exceptional GoodWorking


With multidimensional data Very Good GoodSupport
For PowerPoint Available Not available
Visual Drilldown GoodVery Good
Scalability Good Limited by RAM
2. What kind of chart we use in Qlikview Admin?

We generally uses bar chart, line chart, combo chart, scatter chart, grid chart, etc.
3. Explain Set analysis in qlikview??

It is used for set of groups. Mostly used in aggregated function like sum (year), etc.
4. Define Trellis chart?

In Trellis chart we can create array of chart based on first dimension. Bitmap chart are also made
of trellis display.
5. Explain Mini Chart? What do you mean by sub reports and how we can create them?

With the help of Mini Chart we can set type of modes instead of values in table mode. We can
also change the colors.
6. What is Pivot Table?

Pivot Table:
A pivot table is better at the time of grouping. We can also show pivot table like a cross table
which is a beneficial feature. But there is one disadvantage of it which is if we have to sort a
pivot table than we have to sort it first according to the first dimension then to the next.
7. Which graph we will use for two years difference sale?

BAR Graph we will use.


8. What is Straight Table?

A straight table is much better at the time of sorting as compared to the pivot table as we can sort
it according to any column as per our choice. But it is not good for grouping purpose.
9. How many dimensions we can use in Bar chart?

We can use only two dimension

28
10. Which Qlikview object has only expression and no dimension?

Gauge chart and list box have only expression and no dimension.
11. How we can use Macros in our application?

We can use macros for various purposes like for reloading the application and to create object.
12. What do you understand by layers in Qlikview?

The layer are basically set on the sheet object properties layout where bottom, top, normal
respective to the number -1, 0 and 1.
13. What is Dimensions?

Dimensions allow data examination from various perspectives.


14. Explain about Normalized Data?

Well Structured Form of Data, which doesnt have any repetition or redundancy of data. It’s a
kind of Relational data. It’s mainly used in OLTP kind of stuffs Denormalized Data – It’s a
whole bunch of data without any relationship among themselves, with redundancy of data. It’s
mainly used in OLAP kind of stuffs.
15. What Is Star Schema?

The simplest form of dimensional model, in which data is prearranged into facts and dimensions
is known as Star schema.
16. What is Snowflake Schema?

A snowflake schema is a difference of the star schema. Snowflake is used to improve the
presentation of particular queries.
17. Explain interval match?

The internal match is prefixes with the load statement which is used for connecting different
numeric values to one or more numeric interval.
18. Explain internal match function ()?

Internal match function is used to generate data bucket of different sizes.


19. What is Container?

A container object is used to keep multiple charts. We can use a container object to keep many
charts in the same box.
20. What do you understand by extended interval match function ()?

29
Extended interval match function () is used for slowly changing the dimensions.
21 .What are the new features in QV 11?

Container Object; Granular Chart Dimension Control; Actions like, clear filed; Meta data, etc are
the new features in QV 11.
22. Explain joins and its types?

Join is used to convert the given data and whenever we are using joins for converting data is its
known ad Data Merging.
It has many types:
a. Left join
b. Right join
c. Inner join, etc
23. What is Left Join?

Left join specifies that the join between the two tables should be left join, it uses before the word
join. The resulting table only contain the combination among two tables with the full data set
from the first table.
24. Define right join?

Right join specifies that the join between the two tables should be right join, it uses before the
word join. The resulting table only contain the combination among two tables with the full data
set from the second table.
25. Explain Inner Join?

Inner join specifies that the join between the two tables should be inner join. The resulting table
should contain the full data set from both the sides.
26. What are modifiers?

Modifiers deals with the Fields name.


For example: sum ({$<Region=>} Sales)
Returns the sales for current selection, but with the selection in “Region” is removed.
27. Explain Identifiers Syntax?

1. 0- Represents the empty set


2. 1- Represents the full sets of records

30
3. $-Represents the record of current selection
4. $1-Represents the previous selection
5. $_1-Represents the next selection
6. Bookmark01-Represents the Bookmark name
28. Explain 3-tier architecture of Qlikview Application?

1-tier: Raw data is loaded and we create QVD


2-tier: QVD is converted in business login and the requirement of business and data model is
created.
3-tier: Reading all QVD from 2-tier and we make a single QVW.
29. How does Qlikview stores the data internally?

Qlikview stores the data in QVD as QVD has data compression capability. Qlikview has better
performance than other BI because of its memory analytics approach.
30. Explain the restrictions of Binary load for a QlikView Developer?

Binary Load can be used for only one application means we can only read the data from one
QVW application and moreover set scripts is also a restriction.
31. For a QlikView Administrator, differentiate between subset Ratio and Information Density.

Subset Ratio: It is used for easily spot problem in key field association.it is only relevant for key
fields since they are present in multiple tables and do not share the same value.
Information Density: It is the field which contain the percentage of row which contain the non-
null value.
32. What is the use of Optimized Load?

Optimized load is much faster and preferable especially for large set of data. It is possible if n o
transformation are made at the time of load and no filtering is done.
33. Differentiate between keep and joins?

Keep and joins do the same functions but in keep creates the two tables whereas join only creates
the one table. Keep is used before the load or select statements.
34. Define synthetic Key?

Synthetic key is the key where two or more tables consists more than one common column
between them is called as synthetic key.

31
35. What is incremental load in Qlikview Architect?

Incremental load is nothing but loading new or changed records from the database. With the help
of QVD files we can use incremental Load.
36. Differentiate between set and let option in Qlikview?

Set: it assigns the variable without assesses the expression.


Let: it assigns the variable with assesses the expression.
37. Define Qlikview Resident Load.

Resident load is a part of loading data in Qlikview application. It is used for loading data in
tables which is already loaded in Qlikview application.
38. How we can optimize QV application?

It can be optimized by creating the data into qvds. When complete qvw application is changed
into qvd than this qvd will be store in the RAM.
39. What is mapping load?

Mapping load is used to create the mapping table that can be used for replacing field value and
field names.
40. Define apply map.

Apply map is used to add fields to the tables with the help of other tables. It can be used as joins.
41. What is concatenation?

It means sequence of interconnected things i.e. any column or row which is related to each other
can be connected through concatenation.
42. Define NoConcatenation.

NoConcatenation prefix is used to force the identical tables as two separate internal tables.
43. Define connect statement.

It is used to establish a connection to database with the help of ODBC or OLEDB interface.
44. What do you understand by Fact constellation Schema?

It is a logical database structure of data Warehouse. It is designed with the help of De


Normalized Fact...
45. What do you mean by RDBMS?

32
It stands for relational Database management System. It arranges the data into respective column
and rows.
46. What do you understand by the term CAL in Qlikview Server?

Every client needs a CAL to get connected with Qlikview Server. The CALS are taken up with
Qlikview Server and tied with the server serial number.
47. Differentiate between QV server and publisher?

QV Server is a program that is installed on computer with various CALS which allow user to
access QV Files on the server. Publisher is a program which manages centralized control on our
QV files and manages them how and when they are loaded and distributed.
48. What do you understand by snapshot view of the table?

By this option we can see number no of tables and related associations.


49. How we can bring data into qv?

We use ODBC, OLEDB, and SAPconnector’s kind of data connections.


50. How we can handle Early Arriving Facts.

We can load data from ODBC, OLEDB, SAP connectors, by select statements and we can also
load files like excel, word, etc. by using Table Syntax.
51. What type of data we generally use?
We use flat files, excels, QVDs, etc ad data.

52. Explain about QlikView?

QlikView is the Business Intelligence tool used by the University of St Andrews. Data from
different University systems is combined and presented in a single dashboard in an easy and
understandable way.
QlikView dashboards at the University of St Andrews are built on the following principles:
 Dashboards must be effective to use
 Dashboards must support users in carrying out their tasks
 Dashboards must provide the right kind of functionality
 It must be easy to learn how to use a dashboard
 It must be easy to remember how to use a dashboard

33
 To use QlikView, you do not need to have technical expertise in information systems,
just a willingness to learn how it can support you.
53. What are the benefits of using QlikView?

As the name suggests, QlikView is a combination of quick and click and these features make it
intuitive and easy to use. Users can visualize data, search multiple data sets, and create ad hoc
reports, and view patterns and trends in data that may not have been visible in other reports.
QlikView is
 Flexible – dashboards are web based and accessible from desktop computers and mobile
devices
 Interactive – users are able to drill down and select particular data within charts or tables
 Usable – users can see large amounts of data effectively and efficiently
 Scalable – useful for multiple business processes at analytical, operational and strategic
levels
54. How is QlikView 11 different from QlikView 10?

QlikView 11 brings new levels of capability and manageability to the QlikView Business
Discovery platform. In this release, we focused our investments on five value propositions:
 Improve collaborative decision making with Social Business Discovery
 Gain new insights into opportunities and threats and relative business performance with
comparative analysis
 Expand QlikView usage to additional devices, including smartphones, with mobile
Business Discovery
 Enable a broad spectrum of users to jointly develop QlikView apps with QlikView’s
rapid analytic app platform capabilities
 Improve the manageability and performance of QlikView with new enterprise platform
capabilities.
55. What is QlikView comparative analysis in QlikView Developer platform?

Business users can quickly gain new kinds of insight when analyzing information in QlikView,
with new comparative analysis options. QlikView App developers can now create multiple

34
selection states in a QlikView app; they can create graphs, tables, or sheets based on different
selection sets.
56. What mobile device platforms does QlikView 11 support?

QlikView 11 delivers mobile functionality for Apple iOS and Android tablets and smartphones.
QlikView supports Android tablets when the following conditions are met:
 QlikView Server version 10 SR3 or later
 The native browser, not a downloaded one
 Currently our HTML5 web apps support only Apple and Android handhelds. Because
many Black Berry are older devices that don’t fully support HTML5 (and many are non-
touch), we don’t have a web-based solution for them at this time.
57. For QlikView Admin, what is document-level auditing in QlikView 11?

New optional settings within QlikView Management Console enable administrators to more
effectively audit user interactions. Administrators can audit QlikView usage not only at the
system level (the entire QlikView Server), but down to the document level.
58. What are the key differences between QlikView and any other standard statistical software
package (SAS, SPSS)?

Key difference is in terms of the database used. QlikView offers a quite simple visualization that
matches the MS excel filtering. SAS is useful in case of Meta data while SPSS is good for
analysis.
In comparison of the above three, QlikView is most user friendly and fast in terms of generating
diverse dashboards/templates.
In terms of calculations, advanced statistics options are limited in QlikView.
For market research and analysis SPSS has direct facility algorithms.
59. What are QlikView annotations?

With the new annotations collaboration object QlikView users can engage in threaded
discussions about QlikView content. A user can create notes associated with any QlikView
object. Other users can then add their own commentary to create a threaded discussion. Users
can capture snapshots of their selections and include them in the discussion so others can get
back to the same place in the analysis when reviewing notes and comments. QlikView captures

35
the state of the object (current selections), as well as who made each note and comment and
when, for a lasting record of how a decision was made.
60. What are the main features of QlikView?

QlikView offers the following features:


 Dynamic BI Ecosystem
 Data visualization
 Interacting with dynamic apps, dashboards and analytics
 Searching across all data
 Secure, real-time collaboration

Star schema and snow flake schema

QlikView can handle Star schema and Snow flake schemas effectively. Star schema is simple to
understand. It is good for reporting as number of joins are reduced.

Star schema consists of dimensions and facts. It has a fact in the middle and dimensions
surrounding the fact. The schema shapes like a star and hence the name star schema.

 Facts: A fact table contains numeric value. It contains a quantitative value such as sales,
revenue, or profit.
 Dimensions: A dimension table contains textual description. Dimensions provide context to the
facts, for example, sales by product.

Fact tables contain foreign keys of dimension tables. The following schematic represents the
relationship between the fact and dimension tables:

In a snow flake schema, a dimension is not connected directly to the fact. It is connected to another
dimension.

36
Normalization De-Normalization

Normalization is the process of dividing the De-Normalization is the opposite process of


data into multiple tables, so that data normalization where the data from multiple
redundancy and data integrities are achieved. tables are combined into one table, so that
data retrieval will be faster.

It removes data redundancy i.e.; it eliminates It creates data redundancy i.e.; duplicate data
any duplicate data from the same table and may be found in the same table.
puts into a separate new table.

It maintains data integrity i.e.; any addition or It may not retain the data integrity.
deletion of data from the table will not create
any mismatch in the relationship of the tables.

It increases the number of tables in the It reduces the number of tables and hence
database and hence the joins to get the result. reduces the number of joins. Hence the
performance of the query is faster here
compared to normalized tables.

Even though it creates multiple tables, inserts, In this case all the duplicate data are at single
updates and deletes are more efficient in this table and care should be taken to
case. If we have to insert/update/delete any insert/delete/update all the related data in that

37
data, we have to perform the transaction in table. Failing to do so will create data integrity
that particular table. Hence there is no fear of issues.
data loss or data integrity.

Use normalized tables where more number of Use de-normalization where joins are
insert/update/delete operations are performed expensive and frequent query is executed on
and joins of those tables are not expensive. the tables.

38

You might also like