You are on page 1of 11

A Simple Approach To Multi-Tenant Data

Testing
By Melvin Laguren

With all the different types of testing methods performed on a product, the tester always has the following
question in the back of his or her mind:

Have I done enough testing?
This question begins to move to the front of the tester’s mind when the application being tested is multi-tenant in
nature and new ones begin to form in the back of the mind:

 How can I guarantee that one customer does not see another customer’s data?
 What testing method can I use to guarantee it?

Manually testing this by hand and taking a screenshot of the result is a long and tedious process and is prone to
user error. As the number of customers and their data increases, so does the time it will take to perform this
testing activity and the chances for error . Another issue with this approach is that with each new software release,
the process would have to be repeated. In the end, this process can potentially become a full time job for one
person.

Automating the manual process is the first step in the right direction. The only drawback here is that the
automated code has to be maintained and updated for each software upgrade.

Both methods require accountability from the tester. Whereas the manual process requires the tester to take a
snapshot and store it somewhere, the automation process is designed to record the testing activity. Both these
methods require a lot of planning to insure that the data is captured correctly

Where to begin?
When creating your multi-tenant data test solution, you must not only identify the problem --you must also
identify all the components that lead to a solution.

Let’s start by formally identifying the problem. Taking an example from the agile development method, a good
approach to writing the problem is in the form of a user story:

AS A PROFESSIONAL TESTER, I WOULD LIKE TO BE ABLE TO TEST THAT THE DATA CREATED IN THE APPLICATION CAN ONLY BE
VIEWED BY THE CUSTOMER WHO CREATED IT, SO THAT THE CUSTOMER IS CONFIDENT THAT THEIR INFORMATION CANNOT BE
VIEWED BY ANOTHER.

The “reason” from the user story explains it all. The test solution being developed must make the customer
confident that their information cannot be viewed by another.
Background Information
To begin identifying the solution, the following parameters will be added to the problem:

 The multi-tenant application is an Ajax-based web application.
 There is no budget to purchase tools and there are currently none at the tester’s disposal.
 The customer works for a regulated industry, therefore the customer will perform an external audit before
accepting the application.

In reviewing the identified parameters, it is easy to see that the third bullet will be important in developing the
solution. The results of the test must be clearly documented to convince an auditor that thorough testing has
been performed on the application to insure data security.

Identify Possible Solutions
Now examine the first two parameters. The second parameter would imply that the tester would use a manual
solution.

Manual Solution 1
1. Log into the application as a customer
2. Take screen shots of all web pages that contains customer data
3. Save screen shots in a folder
4. Log out of application
5. Log into the application as a different customer and repeat steps 2 thru 4
6. Compare the data saved from the first customer to the data saved from the second customer

By testing the application manually – saving screenshots of the different pages that display the unique data and
visibly comparing the differences to the similar pages between the customer data pages – this solution has
addressed the three parameters that was given to the problem.

Drawbacks to Manual Solution 1
The first and foremost drawback with this solution is human error. A thorough tester would test more than just two
different customers. As the number of “customers” used for testing increases, the chances the tester could
inaccurately record the data from steps 2, 3 and 4 increase.

Another drawback is that the data being used for testing may fail to discover real world problems. This is especially
true if this is a first release of the application. Setting up test data can be time consuming and a tester may miss
something, especially if data is being entered for more than two customers.

So, what now? Analyzing the solution has introduced two new problems that need to be accounted for:

 As the number of customers used for testing increases, so does the chance of incorrectly recording data
 As the number of customers increases, there is a chance that the data used will not find a problem
Manual Solution 2
Looking at the parameters again, the third parameter could help solve the new problems discovered. The solution
now includes a second set of eyes.

Manual Solution 2A
1. Log into the application as a customer
2. Take screen shots of all web pages that contains customer data
3. Save screen shots in a folder
4. Log out of application
5. Log into the application as a different customer and repeat steps 2 thru 4
6. Compare the data saved from the first customer to the data saved from the second customer
7. Repeat Steps 1 – 6 with another tester

Manual Solution 2B
1. Tester A creates data in Test Environment A
2. Tester B creates data in Test Environment B
3. Tester A performs Manual Solution 2A (steps 1-6) on environment B
4. Tester B performs Manual Solution 2A (steps 1-6) on environment A
5. Testers A & B switch environments and repeat Manual Solution 2A (steps 1-6)

Drawbacks to Manual Solution 2
The solution reduces the potential of error because a second set of eyes are involved. What drawbacks exist with
this solution?

The difference between the first and second solution is that a second tester is involved in the process. What if this
resource is not available? Part of the second parameter says that, “there is no budget to purchase tools.” Applying
this parameter to the equation, the odds of hiring another tester is unlikely.

The level of confidence with the second solution is definitely more assuring to an auditor than the first solution.
Then again, if another tester is available to assist in the process, there is no guarantee that this second solution will
insure information security since it does not truly solve the human error issue discussed in the first solution.

How about automation?
Automation could violate the second parameter. However, since the first parameter says that the application is
web based, there are a multitude of open source applications available to automate either of the two processes
mentioned above. Automating the process means that considerable investment is required initially, then the
investment should be focused on developing the second solution. The reason is that after the solution has been
automated, the tester will have more time to create application data to be used in the test.

Automating the second solution is a very good beginning. The advantage for the tester is that the initial investment
made now means that down the road, more time can be focused on creating additional data and only minor
updating of the automated scripts for future versions of the application.
Other than not decreasing the odds of finding a problem, the biggest drawback an automation tool has is that it
cannot do the data comparison. This responsibility still belongs to the tester(s) to verify that the data is unique
between customers. This can be troublesome when the number of comparisons increases.

Problem Solved?
So far, 3 possible solutions have been discussed. All 3 basically followed the following pattern:

1. Test the application as a customer
2. Record the data being shown on each page
3. Continue Steps 1 and 2 with a different customer
4. Compare the results to make sure that they are unique

Under a tight deadline and limited resources, the tester’s focus will be on the functionality and performance of the
application which may lead him or her to think that as long as any of the three methods are used, the odds of a
data “bleed” are small. Others involved in the software development process may feel that the architecture of the
software will insure that this “bleed” will not occur, especially when combined with one of these three testing
methods.

This should not put a tester’s mind at ease, and it definitely will not put the auditor’s. So, how does the tester put
this issue to rest and focus on everything else that can possibly go wrong? Obviously, getting involved early on in
the development process will be great, especially if the tester has seen the common mistakes that can be made
when designing the application.

Increase Reliability
To increase the reliability of the testing, there are two items that the tester needs to get involved with long before
implementing a repeatable test process. Covering these two items will lead to a much more solid solution. This will
not only convince you and your colleagues that the odds the data could bleed elsewhere are very slim, but will also
convince outside observers that the testing is more than adequate.

Common Errors
The first is to understand the two most common errors that will lead to the problem and how to spot them.

Missing Foreign Keys
In the design of the database that will store the information, it is very rare that a direct relationship between two
tables would occur without having a foreign key between the parent table and the child table. The more common
mistake would occur as more relations are added to a table.

To illustrate the problem, one of the requirements for an application is that a contractor can return back to their bid
to make changes prior to finalizing their bid.
PROJECT BID CONTRACTOR

• id • id • id
• name • project_id • contractor_name
• description • amount
• closing_date • final

Figure 1

Being involved at the design phase, the tester would be able to see that the tables designed in Figure 1 will not
satisfy the necessary requirements. The reason is that if the application displays the bid for the project, the
contractor will be able to see all the bids made for the project that they are interested.

This early catch in the design assures that the data stored in the bid table is associated with the contractor table, as
well as the project table (Figure 2).

PROJECT BID CONTRACTOR

•id •id •id
•name •project_id •contractor_name
•description •contractor_id
•closing_date •amount
•final

Figure 2

The Where Clause
Forgetting the where clause or not completely including everything in the where clause is another common error
that can result in data “bleeding”. Even with the improvements made in the database in the Figure 2, if the
following query is executed:

SELECT PROJECT.NAME, BID.AMOUNT, BID.FINAL FROM PROJECT INNER JOIN BID ON PROJECT.ID =
BID.PROJECT_ID WHERE PROJECT.PROJECT_ID = 1;

When executed by the web application, the contractor would be able to see all bids made on a project. Depending
on further requirements, the contractor could potentially edit or delete other bids to the project. The tester would
be able to catch this error and the following correction would be made:

SELECT PROJECT.NAME, BID.AMOUNT, BID.FINAL FROM PROJECT INNER JOIN BID ON PROJECT.ID =
BID.PROJECT_ID WHERE PROJECT.PROJECT_ID = 1 AND BID.CONTRACTOR_ID

Being involved in the design process and catching these common mistakes, will decrease the odds of customers
accessing data that they should not have access to.
Mind Mapping
The second important item is to know the data model. Understanding the database tables and what is accessible
from the user interface will help you identify where to focus the testing. For example, if every customer has access
to the same contractor, then the test does not need to see if the contractor is unique per customer. If the
contractor works for different customers, then it is very important to see that the application allows customers to
see other customers that use the same contractor.

Mind mapping is the perfect technique to draw out what important information needs to be isolated from other
users of the application. There are even a lot of free tools that can help to create the Mind Maps. Figure 3 was
created using a free tool called FreeMind.

Figure 3

With the mind map created, it is easy to see the key data that should be isolated. Now the automation tool can
focus on accessing the web pages that display this information.

Developing The Automated Solution
Since the application is web based, there is an abundance of open source tools that can be used. The tool of choice
depends on the overall solution being developed. The ideal approach would be a simple script that will execute the
test and return a report.

Gather Generate
Data Report

Compare
Data
Gathering Data Using JMeter
JMeter, a functional load testing tool, is the ideal tool for accomplishing the first part of the test. As a functional
load testing tool, it can automatically log into the application and navigate the various pages at once. Compared
to the manual method, the application will log in with all N users at once instead of one at a time.

Figure 4

JMeter also provides a post processor component, “Save Responses To A File”. This component will read the
responses from the server for each request made by JMeter and write it to a file. Place this component after each
http request that is used to call the web pages that display the customer only data. In Figure 4, JMeter has the
ability to add a prefix to the file being written. It will be very important to create a unique prefix for each request
that is unique, for example User1.xml, User2.xml, etc. In Figure 5, each of the file’s prefix are unique so that later
on the comparisons can be done on similar files.

Figure 5
Finally, JMeter can be executed from a script. This will make it easier to integrate into a multi-tenant testing
application, especially since getting the files is only the beginning of the problem.

Shell Scripting with Cygwin
Early on, it was noted that one of the drawbacks to the automated solution is that the automation tool could not
perform the comparison. It meant that the tester would be required to perform the task. A scripting language has
the capability to do the same task. One option is to use shell scripting. Since JMeter can be executed on either a
windows based computer or a *nix based computer, shell scripting would be the ideal language to use because of
Cygwin, a Linux-like environment for Windows.

What should the shell script accomplish? Since there are various tasks that the script must do, it would be best to
create several scripts to accomplish the following task:

1. Execute JMeter
2. Gather the files created by JMeter and group them so that it will be easy to compare
1
3. Remove excess information from the files for easy comparison
4. Perform the comparison
5. Create a response

Finally, one script can be created to execute each of the scripts created above.

Run JMeter
The first script is pretty straightforward. The script will navigate to JMeter’s installation directory and execute the
following command, ‘./jmeter -n multi-tenant.jmx’ where multi-tenant.jmx is the test case created by JMeter. To
make the script feel more robust, a simple series of tests can be performed to make sure that the conditions to run
JMeter are valid.

The first would make sure that the number of threads defined by multi-tenant.jmx is less than or equal to the
number of lines in the configuration csv file used for logging in. The reason is that if there are more threads
defined, then jmeter will start back at the beginning of the csv file for the next set of parameters to be used with
the next thread. This will definitely result in a duplicate data down the road.

The second check would insure that the test is setup to run correctly. If the number of threads defined by multi-
tenant.jmx is 1, then it is an invalid test. If the csv configuration file defined in multi-tenant.jmx does not exist, then
there is problem.

Adding these two checks will definitely make a stronger testing tool.

Gather the Results
Once JMeter has completed its run, the next task is to gather the saved files and place them in a single folder to
help identify the test run. Within that folder should be a folder for each of the different data sets used for
comparison.

1
Since JMeter writes the response from the server to a file, additional information (headers, html code, etc) would
exist.
Again, to make the script more robust, a simple file count between each of the subfolders will determine if the
correct number of files exist. Even if a particular request made does not contain any data, a file is still created. This
comparison can be made to see if the number of threads defined by ‘multi-tenant.jmx’ resulted with the correct
number of responses.

File Clean Up
Since the parameters state that the multi-tenant application is an Ajax- based web application, it is safe to assume
that the data being transported from the server to the client browser is in some sort of XML format. Each of the
files can now be cleaned up, so that all that remains is the xml data in question. The other important task that
needs to be done is to make sure that each of the individual elements and their attributes are separated into their
own line. This will make the file comparison easier.

By this point, there is probably no need to make any additional error checking prior to executing the clean up. If
anything, a post check could be performed to see if any files that will be used in the next script are blank.

Comparison
Now comes the time consuming and confusing part of the test, comparing the files within each of the group. The
more files generated for each of the data set, the longer the time. To be exact it will be:
𝑛

𝑛 − 𝑛
1

When writing the script, the first thing the script needs to know is the number of data comparisons or folders
created from the second executed script. Since the mind map created back in figure 3, a check can be performed to
make sure that there are three folders created for the different groups of data sets being compared.

ls –d */ > folders.txt
if wc –l folders.txt !=3…

The next part of the script will navigate to each folder and execute the comparison. Once inside a folder, the
comparison begins. Just like earlier, a file is generated which knows what files exist in the folder that needs to be
compared.

Before beginning the ordeal of executing the correct number of comparisons, the actual comparison script (to be
referred to as datacompare.sh) should be addressed. All *nix shells are provided with a diff command. The
inclination would be to use this in the shell script. This would be the wrong command to use since the diff
command will only compare line 1 of file A to line 1 of file B. It will not see if line 1 of file A exists anywhere else in
the file. In this case, the ever popular grep command would be more than enough.

# datacompare.sh
cat file1.txt | while read line
do
grep $line file2.txt > response.txt
cat response.txt >> results.txt
done
Replace file1.txt and file2.txt with $1 and $2 respectively, and now the datacompare.sh script will have 2
parameters that will be needed in order for the comparison to take place.

Managing the data files for comparisons can be handled by a double while loop and a manipulation of the number
that appears before the data files extension which was added by JMeter.

# comparison_manager.sh – Performed on folder XXX
cat datafiles.txt | while read i; do
cat datafiles.txt | while read j; do
if [ $i -lt $j ]
then ./datacompare.sh "User$i.xml" "User$j.xml"
uniq results.txt > output.txt
mv output.txt "$i"_to_"$j".data
fi
done
done

Above is the basic comparison algorithm. When completed, the end result will be a set of files which are the end
results of the comparison of 1 customer to another customer.

Analyze and Report
The solution is almost complete. All that is left is to comb through the various result files to determine if there is
any data in question. Just like gathering the files created by JMeter, the same technique will apply to the files
generated by the comparison script.

Once the files have been sorted, the first thing to do is to clean up the data in the files. Why? The data being
collected is stored in xml, the xml tags will automatically show up as a match for every comparison being executed.
This clean up will make it extremely easy to see if there is a problem. Files that do not have any matches will be
blank. A simple sed command can remove the blank lines in each of the files:

sed –i ‘/^$/d’ $12

After the removal of xml tags and blank lines from the different comparison result files, the following script below
will create a report file that will only contain the names of the files that may need further investigation.

# report.sh – Performed on folder XXX
ls *.data > files.rpt
cat files.rpt| while read line
do
if ((wc –l $line) > 0)
echo $line >> XXX.rpt
cat $line >> XXX.rpt
fi
done

2
If the ‘-i’ option is not available, then redirection should be used.
Ideally throughout the development of the solution, adequate testing was performed on both the JMeter test case
and the shell scripts, the report file generated will give accurate information about the application being tested.

Upgrade and Expansion
Just like any automated script, this solution will continue on in maintenance mode as the application being tested
grows. This will be easy to perform since the script was designed to call the following functions:

1. Execute JMeter to gather the http response from web based application and place them in data files.
2. Execute shell script to group data files and clean up files to leave only the data identified from mind map.
3. Execute shell script to compare the data groups and create comparison result files.
4. Execute shell script to group the comparison result files and clean up.
5. Analysis is performed on the clean comparison result files.
6. Report generated will identify any problems.

If designed properly, this script can be adapted to any other multi-tenant web based application by making minor
changes to the various shell scripts and the one JMeter test case. For non web-based applications, this method can
be applied by replacing JMeter with an emulator and making the necessary modifications to the remaining scripts.