2009 Fourth International Conference on Software Engineering Advances

A Comparative Study on Automated Software Test Oracle Methods

Seyed Reza Shahamiri, Wan Mohd Nasir Wan Kadir, Siti Zaiton Mohd-Hashim
Department of Software Engineering Faculty of Computer Science and Information System Universiti Teknologi Malaysia Skudai, Malaysia admin@rshahamiri.com, wnasir@utm.my, sitizaiton@utm.my
Abstract—Software testing has been used to find software faults in order to improve its quality. To verify the software behavior, testers require test oracle. Test oracle is a reliable source of expected software behavior that provides outputs for any input specified in the software specifications and a comparator to verify actual results. While test automation requires automated oracle support, oracle automation is considered as a challenging task. These challenges are from the automation required in expected output generation and results verification. This paper presents oracle activities and the challenges to prepare automated oracle. Then a comparative study of existing automated oracle and expected output generation methods is provided. Finally, a classification of these methods is suggested based on how these methods provide automated test oracle and the tool they used. The classification explains which oracle activities will be automated by the proposed approaches. Keywords - Software Engineering; Software Testing, Automated Software Testing; Automated Test Oracle, Expected Output Generation.

automatically, none of them could completely automate all test oracle activities in all circumstances. This paper explains the process of using a test oracle and its activities. Then the challenges to provide complete and automated oracle are explained. Finally, after existing automated oracle are described, a comparative study is provided. It shows what test oracle activities will be automated by the proposed methods and how they overcome the challenges. In addition, since one of the major challenges to develop automatic test oracle is expected output generation, existed methods to automate expected output generation are considered in the comparative study. II. TEST ORACLE PROCESS AND CHALLENGES After tester executes test cases and take the outputs of the AUT, they need to decide if these outputs are correct to determine the correctness of software behavior. To make this decision, they need correct outputs to compare with the generated software outputs. In testing literature, software’s generated outputs that need to be evaluated are called actual outputs, and the correct outputs that use to evaluate actual outputs are called expected outputs. Therefore, an oracle is complete and reliable source of expected outputs and a tool to find faults. Simply this tool makes a comparison between actual and expected outputs. The process of finding correct and reliable expected outputs is called oracle problem [1]. According to [2], oracle information and oracle procedure are building blocks of a test oracle. The former is a source of expected outputs and the last is the comparator. Any modification in oracle information and using different comparator may obtain in different oracles. Possible test oracle process activities are: 1) Generate expected outputs 2) Saving the generated outputs 3) Execute the test cases 4) Compare expected and actual outputs 5) Decide if there is a fault or not Figure 1 depicts the oracle process and its activities. Note that test case execution is not part of test oracle, but it is part of the oracle process.



In order to improve software quality and reliability, software testing has been used as a process of finding errors and failures in software products. While software testing process is a very expensive process in terms of time, financial and resources, complete testing is practically impossible. Test Automation is one of the main approaches has been applied to decrease the testing costs. While test automation requires automated test execution and results verification, which is called Automated Test Oracle, oracle automation is considered to support test automation framework. A test oracle is a reliable source of expected outputs. It has been applied to verify test case results that executed on Application Under Test (AUT). A Complete Test Oracle must provide correct outputs for any input specified in software specifications. Since manual and human oracles are costly and unreliable, automated and complete test oracles will be required to ensure the testing quality while reducing the testing costs. Duo to challenges in providing complete test oracles, it can be expensive and sometimes impossible to provide a complete oracle reliably. Although several researches have been done to provide test oracle
978-0-7695-3777-1/09 $26.00 © 2009 IEEE DOI 10.1109/ICSEA.2009.29 140

Authorized licensed use limited to: University of York. Downloaded on July 05,2010 at 13:53:51 UTC from IEEE Xplore. Restrictions apply.

Generate expected outputs Test case inputs Saving
Test case inputs

Automated Oracle

Expected output


Fault report


Expected outputs
AUT Actual output

Test case execution results

Expected output Comparator

Figure 2. Using an automated test oracle

Actual output

Fault report

Figure 1. Test Oracle Process

There are challenges to develop a complete test oracle. First and the most important one is expected output generation and build a database of them. Generally, expected outputs are manually generated based on program specifications or programmer’s knowledge of how software should behave and memorized (Direct Verification [1]), and manually look up in the database. The problem is that the number of expected outputs can be very large in real world applications. Therefore, manual output generation and search can be too expensive and difficult. There is a little literature available on expected output generation. Peters and Parnas [3] suggested to apply table of pairs to provide the expected outputs. They used a relational model for providing the software behavior and represented it as tabular expressions. Bousqent et al. [4] generated test data from formal descriptions such as software environment constraints, functional and safety-oriented properties, software operational profiles and software behavior patterns. They expressed the formal descriptions as logical expressions which will be satisfied by the software. Dillon and Ramakrishna [5] described a tableau algorithm based on temporal constraints that must not be violated during software execution. They provided multi paradigm and multilingual specification for reactive systems too [6, 7]. A process of deriving tests from Z specification showed in [8, 9] and their application as a model-based specification for defining a test template framework followed in [10]. These studies have shown that automatic expected output generation is the main challenge to provide automated test oracle. Another challenge is to look up expected outputs for each test case inputs in expected outputs database. Manual searching may decelerate the testing process significantly. Therefore, it is highly recommend to automate the oracle save and search process. III.

Automated oracle requires a simulated model of AUT. In order to provide a reliable oracle, it is suggested that the simulated model behave like the software under test and automatically generate correct expected outputs for every possible inputs specified in the software documentations. There are a few researches on automated oracle. The following subsections explain them in detail. A. N-Version Diverse Systems and M-Model Program Testing Manolache and Kourie [11] suggested an approach based on N-Version Diverse system. N-Version Diverse is a testing system based on various implementations of a program. To put it differently, testers use various versions with independent implementations of the AUT. All of them implement same functionalities. Then they will be applied to test the AUT as test oracle. This method uses a different implementation of Redundant Computation as gold version of software behavior [1]. A gold version is a trusted implementation of AUT. This idea could be resulted in an expensive process. On the other hand, this method is unable to guarantee the efficiency of testing process. Therefore, the authors explained another solution based on N-Version Testing called M-Model Programs testing (M-mp). The new approach considered reducing the cost of former approach and increasing the reliability of testing process by providing more precise oracle. M-mp testing implements only different versions of the functions to be tested. On the contrary, NVersion Diverse implements several versions of the whole software. B. Decision Table Using a decision table as test oracle studied in [12]. The authors applied decision tables in unit and integration testing of web-based applications includes both client and server pages. A model of software behavior is presented using decision table. A decision table is a software requirements representation model. It has been applied wherever there are many conditions affecting software responses. Decision table consists of a condition section, which presents combination of conditions, and the action section, which are software responses where special conditions satisfied. Each row in the table presents a variant as a unique combination of conditions. Table I shows a template of decision table.

Automatic test oracle can apply to overcome the oracle challenges and provide a complete test oracle. The process of using an automated oracle in software testing is shown in Figure 2 [16].


Authorized licensed use limited to: University of York. Downloaded on July 05,2010 at 13:53:51 UTC from IEEE Xplore. Restrictions apply.

Variant Input Variables Input Actions …

State before test Output Section Expected Expected Expected output state after results sections test …

Input Section

the two states and find faults in the GUI. Figure 4 shows this AI planning based test oracle. E. ANN Based Test Oracle Previously ANNs have been successfully applied in software testing. They have significant capability to simulate the software behavior by learning from <input, output> pairs. Using an ANN as oracle is a black-box testing technique. Black-box testing considers the accuracy of final outputs and software behavior instead of internal software structure. Using an ANN as test oracle requires generating a neural network to act as test oracle and simulate the software behavior. An ANN based test oracle needs I/O pairs as training patterns to simulate the software behaviour. Since ANNs can memorize or learn from I/O pairs, it is possible to apply them as test oracle. Vanmali and his colleges proposed an approach to apply ANNs as test oracle [16]. They modeled an ANN to simulate software behavior using previous version of the AUT, and applied this model as an automated test oracle in regression testing. This approach is evaluated by testing a small CreditApproval application. To put it differently, a gold version of the AUT is provided for regression testing. Previous version of the software used to generate outputs and provide the I/O pairs for training the ANN. This oracle can only test the unchanged software functionalities in the new version to be tested. The results of this study have shown that ANN based test oracles are reliable to test data-centric applications. The process of using an ANN based oracle in regression testing is shown in Figure 5. Aggarwal et al. [17] studied the same approach to solve triangle classification problem. The ANN based test oracle is applied to test a small application that implemented triangle classification. Their work followed by [27]. Both of the approaches mentioned above presented to model and tested discrete functions. Mao and his colleges formulated ANN as test oracle to test continues functions [18]. Consider continuous function Y=F(x) which x is software input vector, y is corresponding output vector and F as the AUT (continues function). This approach modeled F and generated expected output vector. All of the above researches used Perseptron Feed-Forward neural networks. Lu and Mao [19] applied RBF Neural Networks to develop automated oracle and used it to model and test a small mathematic continues function. Although all of these studies have been shown a significance of ANN as automated test oracle, little
Test Case Oracle Formal GUI Model Expected-state Generator Expected State Verifier Actual State Execution Monitor Run-time information adapted from GUI execution

C. IFN Regression Tester There have been several attempts to apply Artificial Intelligence (AI) methods for simulating the AUT behavior and use it as test oracle. These methods varied based on applied AI methods. Last and his colleges [13, 14] introduced a fully automated black-box regression tester using Info Fuzzy Network (IFN). IFN is an approach developed for knowledge discovery and data mining. The interactions between the input and the target attributes of any type (discrete and continuous) represent by an information theoretic connectionist network. An IFN presents the functional requirements by an “oblivious” tree-like structure, where each input attribute is associated with a single layer and the leaf nodes corresponds to input values combinations [13]. The authors developed automated oracle that can generate test cases, execute, and evaluate them automatically based on previous versions of the AUT. The structure of their method is shown in Figure 3. As can be seen, Random Test Generator provides test case inputs adapted from Specification of System Inputs. These specifications contain information about system inputs such as data type and values domain. Test Bed executes these inputs on Legacy Version (previous version of the AUT) and receives system outputs. Next, these test cases will be used to train and model IFN as automated oracle. Therefore, this oracle may detect faults in new version of the AUT. This method can completely automate test case execution and evaluation in regression software testing. D. AI Planner Test Oracle Memon et al. [15] applied AI planning as automated GUI test oracle. In order to automatically derive GUIs expected states during test case execution, the internal behaviour of GUI modeled using a representation of GUI elements and actions. A formal model composed of GUI objects and their specifications designed based on GUI attributes and applied as oracle. In this model, GUI actions define by their preconditions and effects, and expected states will be automatically generated using the proposed model. The actions will be derived from test cases. Similarly, the actual states will be described by set of objects and their properties and obtained by the oracle from an execution monitor. In addition, the oracle has a verifier to automatically compare
Specification of System Inputs Test case Inputs Legacy Version System inputs Test Bed Random Test Generator Test case Outputs IFN Induction Algorithm Test Cases Test Library IFN Structure IFN Model System outputs Test case Inputs AUT System inputs Test Bed Test Case Outputs Fault report System outputs


Figure 3. Using IFN for running and evaluating test cases

Figure 4. AI Planning based Tess Oracle


Authorized licensed use limited to: University of York. Downloaded on July 05,2010 at 13:53:51 UTC from IEEE Xplore. Restrictions apply.

Training Phase

Previous Version of the AUT Program Output


Trained ANN
ANN training

Testing Phase Trained ANN as Oracle Test Case Input AUT Program Output ANN Output Comparison Tool

Fault report

changed input. But similar to dynamic analysis, it cannot guarantee to find all of the I/O relationships [25]. Schroeder and Korel [26] used I/O relationship analysis to generate a reduced set of expected outputs with adequate cost. This study has shown that how to produce expected result for a small portion of inputs and generalize them to generate other test cases. An I/O analysis performed to discover which outputs affected by inputs. Then a reduced set of test cases will be created manually. Finally, expected results will be generated based on the reduced set of test cases and generalized to provide the remaining test cases automatically. IV.

Figure 5. Regression test ANN based oracle

information is available on how the required I/O pairs will be generated to train the neural network. All of the above researches assume expected outputs are available. In addition, they did not evaluate their methods in real applications. Therefore, more studies need to be conducted to provide essential dataset for modeling an ANN based test oracle with adequate cost, and investigate its application to real software testing. In addition, it is recommended to study the application of ANNs as oracle to test non-data-centric applications too. F. Input/output Analysis Based Automatic Expected Output Generator Previously there have been a few researches on semiautomated expected output generation. Combinatorial testing is to test all possible input values combinations. Since the number of these combinations can be very large in practice, effective test data reduction is necessary. Note that it is important to maintain the test quality. Previous attempts to combinatorial test reduction methods such as Orthogonal Arrays [20], Experiment Design [21] and Random Sampling [22] have been decreased the combination size, but their impact of testing quality is unknown. Korel and Schroeder [23, 24] proposed an approach to reduce the combinatorial test data by evaluating relationships between inputs and outputs. They claimed that I/O analysis and its application in test case reduction dose not reduce the testing quality. Manual I/O analysis may be program documentation analysis and interviewing with developers. Automated I/O analysis methods are structural analysis and execution-oriented analysis. Structural analysis is either static or dynamic, and can be applied if testers have access to source codes. Static analysis examines the source code and dynamic analysis examines the run-time information gathered from code execution. Static analysis may overestimate program dependencies and dynamic analysis unable to guarantee full detection of I/O relationships [25]. On the other hand, incomplete I/O relationship detection may result in imperfect test oracle. Execution-oriented analysis is based on program execution. It finds I/O relationships by changing the input values and executing the program while observing the outputs. To put it differently, it can find the relationships between I/O by observing which outputs affected by the

All of the methods mentioned here have advantages and disadvantageous. Each of them automates part(s) of test oracle. Table II provides a comparison between the proposed methods and summarizes their capabilities to automate oracle activities and their limitations. The comparison has made to explain how these methods can be applied to automate oracle activities. The table shows what oracle activities can be automated by each method. In addition, the limitations that the proposed approaches can be faced are explained too. Finally, some other comparative criteria that considered in Table II are: 1) The cost of the methods 2) The reliability of the methods 3) The type of testing that can be automated by the methods V. CONCLUSION AND FUTURE WORK A comparative study on existing automated test oracle methods is presented in this paper. First, test oracle process and its activities explained. Then, automated oracle and its challenges described. As the main challenge to develop automated oracle, expected output generation discussed. Finally, advantages, disadvantageous and limitations of the proposed methods are shown. As can be seen, all of the proposed automated oracles have limitations. For example, it is still not possible to completely automate the entire oracle process in nonregression testing with reasonable cost and reliability. If testers consider the test cost reduction, they need to perform some of the activities manually. Therefore, cost and automation are moving against each other. Recently ANN has been widely considered as a prominent approach in automated test oracle [16-19, 27]. The main problem with the existed ANN based test oracles is they cannot automate the expected output generation activity except in regression testing. Therefore, testers need other automation methods to use with ANN and provide a complete automation framework. In addition, ANN based test oracles cannot reliable if software is non-deterministic. Finally, it seems that there is still no unique approach to completely automate all oracle activities in any circumstances. Some of the oracle activities can be automated by the proposed approaches under specific testing methods. Most of these methods can verify data-centric outputs and not action-centric outputs. Therefore, it is


Authorized licensed use limited to: University of York. Downloaded on July 05,2010 at 13:53:51 UTC from IEEE Xplore. Restrictions apply.

recommended that further research be undertaken to develop a comprehensive automated test oracle which applicable in

any type of software testing while automate all oracle activities.


Method A: N-Version Diverse Systems and M-Model Program Testing

Automation Tool

Various Implementations

B: Decision Table


C: IFN Regression Tester


D: AI Planner Test Oracle

AI Planning

E :ANN Based Test Oracle


Automated Oracle Activities Expected output generation Saving the generated outputs • Searching for generated outputs • Comparison • Saving the generated outputs • Searching for generated outputs (if database uses to save the inputs and outputs) • Expected output generation • Saving the generated outputs • Searching for generated outputs • Comparison • Expected output generation • Saving the generated outputs • Searching for generated outputs • Comparison • Expected output generation (only in regression testing) • Saving the generated outputs • Searching for expected outputs • Comparison • •

Limitations • • • • • • • Requires various implementations of system functionalities High cost Could not test flow of events Still not reliable Manual output generation No automated comparator Only provides an structured approach for saving and indexing the I/O pairs Only applicable in regression testing Requires a reliable legacy system Requires additional knowledge for IFN modeling Could not test flow of events Based on GUI state comparison Only applicable in GUI testing Requires formal model of the GUI to be tested Requires reliable documentations to provide trusted formal model High cost Manual expected output generation (will be automated in regression testing) Requires additional knowledge Reliable in only data-centric applications Could not test flow of events Still not reliable in non-deterministic applications Requires I/O relationships Automatic I/O relationship analysis cannot guarantee to find all of the relations Can not completely automate the expected output generation Cannot test flow of events

• • • • • • • • • • • • • • • • • •

F: Input/output Analysis Based Automatic Expected Output Generator

I/O relationship analysis

Expected output generation


Authorized licensed use limited to: University of York. Downloaded on July 05,2010 at 13:53:51 UTC from IEEE Xplore. Restrictions apply.

[1] Ammann, P., and Offutt, J.: ‘Introduction To Software Testing’ (Camberidge University Press, 2008, 1th edn. 2008) [2] Xie, Q., and Memon, A.M.: ‘Designing and comparing automated test oracles for GUI-based software applications’, ACM Transactions on Software Engineering and Methodology, 2007, 16, (1), pp. 4 [3] Peters, D., and Parnas, D.L.: ‘Generating a test oracle from program documentation’, in Editor (Ed.)^(Eds.): ‘Book Generating a test oracle from program documentation’ (ACM, 1994, edn.), pp. 58 [4] Bousquet, L.d., Ouabdesselam, F., Richier, J.L., and Zuanon, N.: ‘Lutess: a specification-driven testing environment for synchronous software’. Proc. Proceedings of the 21st international conference on Software engineering, Los Angeles, California, United States1999 pp. Pages [5] Dillon, L.K., and Ramakrishna, Y.S.: ‘Generating oracles from your favorite temporal logic specifications’, SIGSOFT Softw. Eng. Notes, 1996, 21, (6), pp. 106-117 [6] Richardson, D.J., Aha, S.L., and O'Malley, T.O.: ‘Specification-based Test Oracles For Reactive Systems’, in Editor (Ed.)^(Eds.): ‘Book Specification-based Test Oracles For Reactive Systems’ (1992, edn.), pp. 105-118 [7] Debra, J.R.: ‘TAOS: Testing with Analysis and Oracle Support’. Proc. Proceedings of the 1994 ACM SIGSOFT international symposium on Software testing and analysis, Seattle, Washington, United States1994 pp. Pages [8] Hall, P.A.V.: ‘Towards testing with respect to formal specification’, in Editor (Ed.)^(Eds.): ‘Book Towards testing with respect to formal specification’ (IEEE, 1988, 290 edn.), pp. 159-163 [9] Hall, P.A.V.: ‘Relationship between specifications and testing’, Information and Software Technology, 1991, 33, (1), pp. 47-52 [10] Stocks, P., and Carrington, D.: ‘A framework for specification-based testing’, IEEE Transactions on Software Engineering, 1996, 22, (11), pp. 777-793 [11] Manolache, L.I., and Kourie, D.G.: ‘Software testing using model programs’, Software - Practice and Experience, 2001, 31, (13), pp. 1211-1236 [12] Di Lucca, G.A., Fasolino, A.R., Faralli, F., and De Carlini, U.A.D.C.U.: ‘Testing Web applications’, in Editor (Ed.)^(Eds.): ‘Book Testing Web applications’ (2002, edn.), pp. 310-319 [13] Last, M., and Freidman, M.: ‘Black-Box Testing with Info-Fuzzy Networks’, in Last, M., Kandel, A., and Bunke, H. (Eds.): ‘Artificial Intelligence Methods in Software Testing’ (World Scientific, 2004), pp. 21-50 [14] Last, M., Friendman, M., and Kandel, A.: ‘Using data mining for automated software testing’, International Journal of Software Engineering and Knowledge Engineering, 2004, 14, (4), pp. 369-393 [15] Memon, A.M., Pollack, M.E., and Soffa, M.L.: ‘Automated test oracles for GUIs’, SIGSOFT Softw. Eng. Notes, 2000, 25, (6), pp. 3039 [16] Vanmali, M., Last, M., and Kandel, A.: ‘Using a neural network in the software testing process’, International Journal of Intelligent Systems, 2002, 17, (1), pp. 45-62 [17] Aggarwal , K.K., Singh, Y., Kaur , A., and Sangwan , O.P.: ‘A Neural Net based Approach To Test Oracle’, ACM Software Engineering Notes, 2004 [18] Mao, Y., Boqin, F., Li, Z., and Yao, L.: ‘Neural networks based automated test oracle for software testing’, in Editor (Ed.)^(Eds.): ‘Book Neural networks based automated test oracle for software testing’ (Springer Verlag, Heidelberg, D-69121, Germany, 2006, edn.), pp. 498-507 [19] Lu, Y., and Ye, M.: ‘Oracle model based on RBF neural networks for automated software testing’, Information Technology Journal, 2007, 6, (3), pp. 469-474 [20] Phadke, M.S.: ‘Planning Efficient Software Tests’, CrossTalk, 1997, Vol. 10 No.10, (October 1997), pp. 11-15 [21] Dalal, S.R.J., A. Karunanithi, N. Leaton, J.M. Lott, C.M. : ‘Modelbased testing of a highly programmable system’, in Editor (Ed.)^(Eds.): ‘Book Model-based testing of a highly programmable system’ (IEEE, 1998, edn.), pp. 174-178 [22] Chen, T.Y.Y., Y.T.: ‘On the expected number of failures detected by subdomain testing and random testing’, in Editor (Ed.)^(Eds.): ‘Book On the expected number of failures detected by subdomain testing and random testing’ (IEEE, 1996, edn.), pp. 109-119 [23] Korel, B., and Schroeder, P.J.: ‘Maintaining the Quality of Black-Box Testing’, The Journal of Defense Software Engineering, 2001, Vol. 14 No. 5, (May 2001), pp. 24-28 [24] Schroeder, P.J., and Korel, B.: ‘Black-box test reduction using inputoutput analysis’, SIGSOFT Softw. Eng. Notes, 2000, 25, (5), pp. 173177 [25] Patrick, J.S., and Bogdan, K.: ‘Black-box test reduction using inputoutput analysis’, SIGSOFT Softw. Eng. Notes, 2000, 25, (5), pp. 173177 [26] Schroeder, P.J., Faherty, P., and Korel, B.: ‘Generating expected results for automated black-box testing’, in Editor (Ed.)^(Eds.): ‘Book Generating expected results for automated black-box testing’ (2002, edn.), pp. 139-148 [27] Hu, J., Yi, W., Nian-Wei, C., Zhi-Jian, G., and Shuo, W.: ‘Artificial Neural Network for Automatic Test Oracles Generation’. Proc. Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 022008


Authorized licensed use limited to: University of York. Downloaded on July 05,2010 at 13:53:51 UTC from IEEE Xplore. Restrictions apply.

Sign up to vote on this title
UsefulNot useful